clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

IS: missing lang attribute in language #708

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

https://github.com/clarin-eric/ParlaMint/blob/535dae3f802d20ea053e76899ddcf6ab805049c0/Data/ParlaMint-IS/ParlaMint-IS.xml#L124-L127

should be:

 <langUsage> 
    <language xml:lang="en" ident="en">English</language> 
    <language xml:lang="en" ident="is">Icelandic</language>
 </langUsage> 

Or even better is to add an Icelandic translation...

See other corpora, e.g. AT: https://github.com/clarin-eric/ParlaMint/blob/535dae3f802d20ea053e76899ddcf6ab805049c0/Data/ParlaMint-AT/ParlaMint-AT.xml#L157-L162

TomazErjavec commented 1 year ago

Yes, this should be fixed for 3.1, as it causes all segments in IS not to be marked with language in the concordancers. Not such a problem for monolingual corpora, but is a problem for the multilingual one.

starkadur commented 1 year ago

I'm sorry for how late I reacted to this. I somehow thought I had received an email telling me I did not have to do anything but just bear this in mind the next time I would compile the data. But I don't find any such email. If I am not wrong there are only two places that I would need to change this, in the root file of TEI and TEI.ana. Should I just change those files and send them to you?

TomazErjavec commented 1 year ago

If I am not wrong there are only two places that I would need to change this, in the root file of TEI and TEI.ana.

Yes, exactly.

Should I just change those files and send them to you?

Yes please!

TomazErjavec commented 1 year ago

Files sent, closing.