UniversalDependencies / UD_Portuguese-DHBB

Other
2 stars 0 forks source link

first release #1

Open arademaker opened 3 years ago

arademaker commented 3 years ago
  1. I am waiting dhbb 1.0.1 release, http://github.com/cpdoc/dhbb, to produce the first release of this corpus
  2. initial conllu files will be produced by UDPipe followed by reviews. The model will be trained with the last Bosque version.
  3. I will split the files to keep all conllu with a reasonable size to be manually edited. Something like ~10-20 sentences per file. We will need a good name convention for files and sentence ids.
  4. a deploy script will need to be created.
  5. following the same principles from other Portuguese corpora in UD, work will be done in the workbench branch.
arademaker commented 3 years ago

Regarding the strategy to dividing the corpus, I am planning to follow @manning advice in https://cl.lingfil.uu.se/pipermail/ud/2015-November/000095.html.

arademaker commented 2 years ago
4344-001.conllu
 4344-001-10
 4344-001-20
 ...
4344-002.conllu