NCEAS / morpho

Morpho metadata editor
GNU General Public License v2.0
3 stars 1 forks source link

Use UTF-8 for file reading and writing #914

Closed mbjones closed 6 years ago

mbjones commented 6 years ago

Author Name: ben leinfelder (ben leinfelder) Original Redmine Issue: 5238, https://projects.ecoinformatics.org/ecoinfo/issues/5238 Original Date: 2010-11-11 Original Assignee: ben leinfelder


Rather than rely on the "default" character encoding used on individual platforms, Morpho should explicitly read and write text files using UTF-8 character encoding. When non-determinate encodings are used across different systems, special characters (accents, tildes, umlauts, Chinese, etc..) can become garbled and misinterpreted. Using the same encoding for all Morpho reading and writing will mitigate these encoding issues. Note: this does not address character encoding issues that arise from copy/paste actions from other systems that use non-UTF-8 encoding (i.e. Word).

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2010-11-11T21:30:04Z


created a tag of the 1.8.1 code before committing the update to use UTF-8 across the board. https://code.ecoinformatics.org/code/morpho/tags/BEFORE_UTF-8/

mbjones commented 6 years ago

Original Redmine Comment Author Name: Jim Regetz (Jim Regetz) Original Date: 2010-11-11T22:26:43Z


I'll put in a vote for including an explicit encoding declaration in the EML docs that Morpho creates:

<?xml version="1.0" encoding="utf-8" ?>

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2010-11-15T22:10:41Z


This works: Morpho now using UTF-8 for all reading and writing. Additionally, special characters are not being escaped because we can encode them with UTF-8.

This does not: Saving to Metacat (and the subsequent read) result in ????? for characters that should be, say, Chinese. This means (as I suspected) Metacat uses the default character encoding rather than explicitly using UTF-8.

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-01-17T18:50:06Z


This should be closed. UTF-8 is used exclusively in Morpho - especially important now that we have so much internationalization support

mbjones commented 6 years ago

Original Redmine Comment Author Name: Redmine Admin (Redmine Admin) Original Date: 2013-03-27T21:29:43Z


Original Bugzilla ID was 5238