UniversalDependencies / UD_Portuguese-Bosque

This Universal Dependencies (UD) Portuguese treebank.
Other
48 stars 11 forks source link

Release Bosque-UD 2.5 #272

Closed alvelvis closed 4 years ago

alvelvis commented 4 years ago

Hi, Comparing this PR and version 2.4, there are less 612 validation errors. The authors of this release are: Cláudia Freitas, Elvis de Souza, Aline Silveira, Tatiana Cavalcanti and Wograine Evelyn.

arademaker commented 4 years ago

@alvelvis right before and after 2.4 I made a lot of changes in punctuations and other minor errors. Let us skip this explicit mention to authorship by version? No problem to have more names in the README.

alvelvis commented 4 years ago

Sorry, I don't understand. I just wanted to add these names, who built this PR together, to the list of authors in release 2.5. Thanks.

dan-zeman commented 4 years ago

@alvelvis : In order to add names to the list of authors of the following release(s), add them to the Contributors line in the machine-readable metadata section at the end of the README file in the dev branch.

Make sure to separate people by a semicolon (;), and to use the format "Surname(s), GivenName(s)", otherwise the final list may come out corrupt (and with more than 300 contributors, we are not able to guarantee that the problem will be spotted and fixed).

arademaker commented 4 years ago

No @dan-zeman ! I take care of this. We work on a branch workbench, I move the data to dev before the release. Thank you @alvelvis!

dan-zeman commented 4 years ago

Dividing the work is totally up to you guys :-) I just said what is important for me and my tools, regardless what steps you take to achieve it.

arademaker commented 4 years ago

OK @alvelvis, I will accept this PR and I will try to work on the remain errors until the release in the next days. We do have fewer errors

$ grep -i FAILED report-bosque-master.log | awk 'BEGIN {sum=0} {sum=sum+$5} END{print sum}'
3381
$ grep -i FAILED report-bosque-alvelvis-2.log | awk 'BEGIN {sum=0} {sum=sum+$5} END{print sum}'
3048

But more sentences with errors:

$ grep -i FAILED report-bosque-master.log | wc -l
    1034
$ grep -i FAILED report-bosque-alvelvis-2.log | wc -l
    1053

Maybe you have reintroduced Punctuation must not be attached non-projectively errors!?

arademaker commented 4 years ago

Please, @alvelvis, for the next release, try not to accumulate so many changes in one single PR. That will make collaborative work harder.

arademaker commented 4 years ago

@alvelvis , why adding raw directory in the repo? The sentences are easily obtained from the conllu files. This can just increase the repository size. Any reason?

alvelvis commented 4 years ago

@arademaker

Maybe you have reintroduced Punctuation must not be attached non-projectively errors!?

I fixed about 300 punctuations by rule, but I stopped it because at about 10 puncts I corrected, I created 4 different errors... It's difficult to correct automatically.

Please, @alvelvis, for the next release, try not to accumulate so many changes in one single PR. That will make collaborative work harder.

Ok

@alvelvis , why adding raw directory in the repo? The sentences are easily obtained from the conllu files. This can just increase the repository size. Any reason?

I didn't, they have always been there... Maybe I accidentally changed permissions of the files and git thought I changed these files, not sure.