Closed matyaskopp closed 1 year ago
@matyaskopp, I've now:
With this, the party names are recognised, however, some are missing from the TSVs, both Wiki and enco ones. Maybe @AnnaParla could add these?
ERROR: For ParlaMint-UA-listOrg cant find party НСНУ (pp.nsnu) in Wiki TSV
ERROR: For ParlaMint-UA-listOrg cant find party УРДП (pp.urdp) in Wiki TSV
ERROR: For ParlaMint-UA-listOrg cant find party НДП (pp.ndp) in Wiki TSV
ERROR: For ParlaMint-UA-listOrg cant find party Позиція (pp.cp) in Wiki TSV
ERROR: For ParlaMint-UA-listOrg cant find party Справедливість (pp.justice) in Wiki TSV
ERROR: For ParlaMint-UA-listOrg cant find party фСДПУ(о) (fr.sdpuo) in Wiki TSV
and
ERROR: For ParlaMint-UA-listOrg cant find party НСНУ (pp.nsnu) in encoder TSV
ERROR: For ParlaMint-UA-listOrg cant find party УРДП (pp.urdp) in encoder TSV
ERROR: For ParlaMint-UA-listOrg cant find party НДП (pp.ndp) in encoder TSV
ERROR: For ParlaMint-UA-listOrg cant find party Позиція (pp.cp) in encoder TSV
ERROR: For ParlaMint-UA-listOrg cant find party Справедливість (pp.justice) in encoder TSV
Note that all parties / parl. groups should be in the TSVs even if e.g. you can't find their Wiki page; all the values that can't be determined should have the hyphen as their content.
endlines fixed: I have added patch to makefile target that loads updated ukrainian data: https://github.com/clarin-eric/ParlaMint/blob/9d8ef3805162765fd20282275a65c1a3742a0fcb/Corpora/Orientations/Makefile#L104-L108
@matyaskopp, I've now:
- deleted orientations-tsv2tei.xsl (this was an obsolete script)
- fixed wiki-tsv2tei.xsl and enco-tsv2tei.xsl so that they take # into account (actually, they just throw it away, and match for ID)
- modified the scripts so they work for DOS end-of-lines (but it would be better to convert TSVs to Unix first, as we otherwise always use Unix EOLs, also other countries TSVs are Unix)
With this, the party names are recognised, however, some are missing from the TSVs, both Wiki and enco ones. Maybe @AnnaParla could add these?
ERROR: For ParlaMint-UA-listOrg cant find party НСНУ (pp.nsnu) in Wiki TSV ERROR: For ParlaMint-UA-listOrg cant find party УРДП (pp.urdp) in Wiki TSV ERROR: For ParlaMint-UA-listOrg cant find party НДП (pp.ndp) in Wiki TSV ERROR: For ParlaMint-UA-listOrg cant find party Позиція (pp.cp) in Wiki TSV ERROR: For ParlaMint-UA-listOrg cant find party Справедливість (pp.justice) in Wiki TSV ERROR: For ParlaMint-UA-listOrg cant find party фСДПУ(о) (fr.sdpuo) in Wiki TSV
and
ERROR: For ParlaMint-UA-listOrg cant find party НСНУ (pp.nsnu) in encoder TSV ERROR: For ParlaMint-UA-listOrg cant find party УРДП (pp.urdp) in encoder TSV ERROR: For ParlaMint-UA-listOrg cant find party НДП (pp.ndp) in encoder TSV ERROR: For ParlaMint-UA-listOrg cant find party Позиція (pp.cp) in encoder TSV ERROR: For ParlaMint-UA-listOrg cant find party Справедливість (pp.justice) in encoder TSV
Note that all parties / parl. groups should be in the TSVs even if e.g. you can't find their Wiki page; all the values that can't be determined should have the hyphen as their content.
These parties and their wiki urls were added to the org page of our metadata google spreadsheet as part of the Ukrainian parliamentary proceedings extension project covering 2002-2012 (Terms 4-6), but they are not relevant for the 2012-2023 (Terms 7-9) timespan for the ParlaMint-UA corpus.
Shall I add them to the Wiki and enco TSVs for the ParlaMint-UA corpus (2012-2023) anyway?
Shall I add them to the Wiki and enco TSVs for the ParlaMint-UA corpus (2012-2023) anyway?
Yes please. As I wrote:
Note that all parties / parl. groups should be in the TSVs ...; all the values that can't be determined should have the hyphen as their content.
In other words, you should put - in all cells except the country and party ID. This is just to get rid of the error messages, and that you have a complete list of parties in the TSV.
This all seems to work now, no errors. So, closing.
UA corpus preserved IDs but changed some abbreviated names.
The idea is to add the prefix
#
and use IDs matching when prefixes are in the data.