Open balmas opened 9 years ago
Initial versions based upon these can be found in the english branch of arethusa-configs at https://github.com/latin-language-toolkit/arethusa-configs/blob/english/configs/arethusa.morph/en_attributes.json https://github.com/latin-language-toolkit/arethusa-configs/blob/english/configs/arethusa.relation/english.json
And are deployed live on Perseids, accessible by using 'english' as the format of the treebank file.
These still need work though.
@gcelano if I understood what you are trying to do with the morphology correctly, you want only one attribute, pos, (part of speech) and each of the supplied values (CC, CD, DT etc,) are possible values for this attribute.
Using the current aldt treebank schema, I believe we have to have a single character as the mapping value for an attribute in the postag value, so I arbitrarily assigned a single character from a-z0-9 to each of the values in the en_attributes file. We should probably do something more sensible here.
if and when we ever switch to the new version of the treebank schema that we had agreed upon, we would be dropping this postag attribute in favor of more descriptive attributes and wouldn't be limited in this way.
Anyway, play around with it a bit and see what you think.
@gcelano has asked that this be made available on the treebank input form for Sunoikisis.
@gcelano provided tagsets based upon the Stanford Dependencies and asked that we make these available in Arethusa.
The tag sets he provided were:
https://github.com/gcelano/Stanford_Dependencies/blob/master/morph_tagset_arethusa.json https://github.com/gcelano/Stanford_Dependencies/blob/master/syn_tagset_arethusa.json