GlobalDataverseCommunityConsortium / dataverse-language-packs

Repository for language files associated with Dataverse
17 stars 47 forks source link

Encoding issues in translation files #65

Open juancorr opened 4 years ago

juancorr commented 4 years ago

Dataverse v4.19 has some translation files with characters in iso-8859-1 or utf-8 without the escape way . This characters are not right displayed in Dataverse. These files are:

en_US/astrophysics.properties: text/plain; charset=utf-8 en_US/citation.properties: text/plain; charset=iso-8859-1 en_US/astrophysics.properties: text/plain; charset=utf-8 en_US/customDigaai.properties: text/plain; charset=iso-8859-1 en_US/geospatial.properties: text/plain; charset=iso-8859-1 fr_CA/journal_fr.properties: text/plain; charset=iso-8859-1 fr_CA/BuiltInRoles_fr.properties: text/plain; charset=iso-8859-1 fr_CA/astrophysics_fr.properties: text/plain; charset=iso-8859-1 fr_CA/Bundle_fr.properties: text/plain; charset=unknown-8bit fr_CA/socialscience_fr.properties: text/plain; charset=iso-8859-1 fr_CA/ValidationMessages_fr.properties: text/plain; charset=iso-8859-1 fr_CA/MimeTypeDisplay_fr.properties: text/plain; charset=iso-8859-1 fr_CA/geospatial_fr.properties: text/plain; charset=iso-8859-1 fr_CA/citation_fr.properties: text/plain; charset=iso-8859-1 fr_CA/biomedical_fr.properties: text/plain; charset=iso-8859-1 fr_CA/Bundle_fr.properties: text/plain; charset=unknown-8bit fr_CA/MimeTypeFacets_fr.properties: text/plain; charset=iso-8859-1 pt_BR/citation_br.properties: text/plain; charset=iso-8859-1 pt_BR/BuiltInRoles_br.properties: text/plain; charset=iso-8859-1 pt_BR/Bundle_br.properties: text/plain; charset=iso-8859-1

kaitlinnewson commented 4 years ago

Hi @juancorr, do you have any screenshots or more information about where you see these issues in the UI? I've been reviewing our local install of 4.19 in English and French and haven't seen any characters with display issues so far.

juancorr commented 4 years ago

Hi @kaitlinnewson, Sorry, I was wrong, files with ISO-8559 (and relative) types are right. There are only two files with entries with encode issues:

In the English locale: [image: imagen.png] The encoding issue is in the datasetfieldtype.resolution.Spectral.description entry into the astrophysics.properties . This entry is properly codified in French and Spanish

I have found only an entry with codification problems in the French locale, It is in the Bundle_fr.properties but I think that this entry is not used in recent Dataverse versions.

[image: imagen.png] The <96> is a not valid character in the dataset.widgets.editAdvanced.tip entry. I have toke this is a screenshot from the vim Linux command.

You can check the file type files from the Linux console with the command file * . This is the fr_CA directory: astrophysics_fr.properties: ISO-8859 text biomedical_fr.properties: ISO-8859 text BuiltInRoles_fr.properties: ISO-8859 text Bundle_fr.properties: Non-ISO extended-ASCII text, with very long lines citation_fr.properties: ISO-8859 text, with very long lines geospatial_fr.properties: ISO-8859 text, with very long lines journal_fr.properties: ISO-8859 text MimeTypeDetectionByFileExtension_fr.properties: ASCII text MimeTypeDisplay_fr.properties: ISO-8859 text MimeTypeFacets_fr.properties: ISO-8859 text socialscience_fr.properties: ISO-8859 text, with very long lines ValidationMessages_fr.properties: ISO-8859 text

Only the Bundle_fr.properties has encode issues. In the en_US directory, the astrophysics.properties is UTF-8 encoded.

I have found not valid characters with the vim Linux command and the search string /[^a-zA-Z0-9.?!=' )({}#éèô\/à<>"_-,\:;|ê«»ÉÈÊçÇîâû[]+ùÀ^*&%\$Ûï^I\tÂÁüÎëÅ]

Juan Corrales

El mar., 5 may. 2020 a las 19:12, Kaitlin Newson (notifications@github.com) escribió:

Hi @juancorr https://github.com/juancorr, do you have any screenshots or more information about where you see these issues in the UI? I've been reviewing our local install of 4.19 in English and French and haven't seen any characters with display issues so far.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs/issues/65#issuecomment-624186975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACPSCHKW22NUSFDZKZFBETRQBCG5ANCNFSM4LCMZ7UA .

juancorr commented 4 years ago

@kaitlinnewson, I have replied from my e-mail and screenshots are not shown. This is the Screenshot from the Bundle_fr.properties file imagen

and this is the screenshot from the Astronomy Spectral Resolution field in English imagen

mhvezina commented 4 years ago

Hi @juancorr I replaced the en dash (dataset.widgets.editAdvanced.tip) within the fr_CA/Bundle_fr.properties file (version 4.20) for an explicit Unicode character with /nxxxx notation. (https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs/commit/eeb421cea318bf3fac225a4b79f4f32385774014). Is that better?

juancorr commented 4 years ago

Hi @mhvezina,

thank you, it is perfect.

Juan

El sáb., 9 may. 2020 a las 0:30, Marie-Hélène Vézina (< notifications@github.com>) escribió:

Hi @juancorr https://github.com/juancorr I replaced the en dash (dataset.widgets.editAdvanced.tip) within the fr_CA/Bundle_fr.properties file (version 4.20) for an explicit Unicode character with /nxxxx notation. (eeb421c https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs/commit/eeb421cea318bf3fac225a4b79f4f32385774014). Is that better?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs/issues/65#issuecomment-626047914, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACPSCHXMHR2V6U7YU5HY3TRQSBZLANCNFSM4LCMZ7UA .

JayanthyChengan commented 4 years ago

Thanks @juancorr and @mhvezina .

I submitted PR https://github.com/IQSS/dataverse/pull/6904 after correcting datasetfieldtype.resolution.Spectral.description entry in the astrophysics.properties