Closed kosarko closed 3 years ago
it is on this one http://hdl.handle.net/11234/1-3422, so maybe that was used previously as an example and the two got mixed up?
The output in the README file indeed is incorrect.
The output with the current version is as follows. @MichalGawor can you correct the readme accordingly?
{'ref_files': [
{'filename': '', 'pid': 'https://wiki.korpus.cz/doku.php/en:cnk:etalon'},
{'filename': '', 'pid': 'https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-3698/Etalon.tgz?sequence=1'}
], 'description': 'Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,265,722 tokens) and is annotated in the same way as SYN2020 of the Czech National Corpus. The corpus includes fiction (ca 24%), professional and scientific literature (ca 40%) and newspapers (ca 36%). \r\n\r\nThe corpus is provided in a vertical format, where sentence boundaries are marked with a blank line. Every word form is written on a separate line, followed by five tab-separated attributes: syntactic word, lemma, sublemma, tag and verbtag. The texts are shuffled in random chunks of 100 words at maximum (respecting sentence boundaries).', 'license': 'http://creativecommons.org/licenses/by-nc-sa/4.0/'}
https://github.com/clarin-eric/DOGlib/blame/4c5062dce1354c4abb405fa3de43d4f713471eed/README.md#L46-L49
this doesn't seem right: