-
Dear colleagues, thank you for your fantastic work on the long-awaited treebank!
Decided that I should report this to you just in case: one can see from both the `.conllu` files and `stats.xml` tha…
-
The alignment editor and the treebank editor use different naming schemas for words (`` against ``) Can we unify that for easier referencing against each other?
-
This is going to be an ongoing issue to document findings and solutions to performance problems with large XML files.
First tests used treebank documents ranging in size from 22K to 1MB and the areth…
-
I noticed in the treebank data that compounds with "-cum"—like 'tecum'—are tokenized as a single token. E.g.
> ``
Is there a reason that this is not tokenized as two tokens, i.e. 'cum' + 'te'? (Cf. …
-
Right now, if you don't want to look into the code, there is no documentation available.
-
hello! I read your code and find that your training procedure need two directory, one is xml_dir, the other one is parse_dir. Is the xml_dir the corpus directory in the CDTB data? and what is parse_di…
-
@gregorycrane: Can you please provide links to the alignments you mentioned on the call earlier this week? I'm guessing they'll be in https://github.com/gregorycrane/homerica (I believe you said they…
-
I think this is just an error. The two lemmas are unrelated in meaning, and φυλάζω, to divide, doesn't seem to exist in Homer. Cunliffe uses this as his first example of φυλάσσω.
-
Agree it's an edge case. I think we can mitigate it after the release by improving the markup of our treebanked texts to use sentence level alignment rather than word level, eliminating the spans arou…
-
the goal is to add the [capitains sparrow](https://github.com/PerseusDL/Capitains-Sparrow/) cts selector and workflow plugins to the alignment editor input form so that users can select from texts ava…