UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
205 stars 44 forks source link

Is conllu-stats.py Python 3 compatible? #36

Closed lauma closed 1 year ago

lauma commented 5 years ago

Because of validate.py I updated my system to Python 3. After that conllu-stats.py started to fail with this error message:

  File "conllu-stats.py", line 132
    print json.dumps(d)
             ^
SyntaxError: invalid syntax

Does this mean conllu-stats.py still need Python 2.7? Is there a way I can avoid needing two different Python versions to make UD release?

dan-zeman commented 5 years ago

Does this mean conllu-stats.py still need Python 2.7?

I suspect that it might be the case. Do you need the program to prepare your data for the release? I do not use it.

lauma commented 5 years ago

Statistics for readme. Is there some other way to get them?

dan-zeman commented 5 years ago

As a part of the release process, I generate (or update) the file stats.xml in every treebank, so I do not think it is necessary to also have statistics in the README file. One shortcoming is that stats.xml currently does not break up the numbers for train, dev and test (which I think conllu-stats.py does) but I am planning to add it. The statistics in stats.xml are generated using conllu-stats.pl (a Perl program, also in the tools repository).

lauma commented 5 years ago

I seem to remember that release checklist once suggested using it, but looks like it is not the case anymore. Okay, if I don't need it for the readme, then I'll just won't use it and thus won't need Python 2. Thanks!

dan-zeman commented 5 years ago

You are right, it used to be on the checklist. I removed it some time ago because I realized it was no longer needed.

fginter commented 5 years ago

Not sure if it's used, but I'll upgrade it to Py3 one of these days. Should be simple.

leoalenc commented 1 year ago

Not sure if it's used, but I'll upgrade it to Py3 one of these days. Should be simple.

@fginter , that'll be nice. I would like to use the script to get the statistics of the dev version of my treebank.

dan-zeman commented 1 year ago

Not sure if it's used, but I'll upgrade it to Py3 one of these days. Should be simple.

@fginter , that'll be nice. I would like to use the script to get the statistics of the dev version of my treebank.

You can also use conllu-stats.pl, which produces much more information:

conllu-stats.pl *.conllu > stats.xml ; git diff stats.xml

In case you meant UD_Nheengatu-CompLin, I just ran the script, updated the statistics there (in the dev branch) and pushed the change to Github.