NCEAS / z-test-issues

Test issue imports from redmine
0 stars 0 forks source link

odd characters cause html display of eml to fail #491

Closed mbjones closed 7 years ago

mbjones commented 7 years ago

Author Name: gastil gastil (gastil gastil) Original Redmine Issue: 5618, https://projects.ecoinformatics.org/ecoinfo/issues/5618 Original Date: 2012-06-05 Original Assignee: Margaret O'Brien


While examining the HTML skins for EML, we found this example of a test doc caused the display to have an error (blank page):

https://demo2.test.dataone.org/knb/metacat/knb-lter-knz.2.4/default

and once the odd characters are removed, it displays normally: https://demo2.test.dataone.org/knb/metacat/knb-lter-knz.2.5/default

These odd characters were found by cat knb-lter-knz.2.4_mgb.xml | tr -d '\000-\011\013-\177' > foo sort foo | uniq

â ° µ They are found on lines 39, 48, 577, 680, and 682 of knb-lter-knz.2.4 in the attached eml file. (This not likely to match the doc of the same pkg Id in the lter metacat, as that one is probably cleaned up.)

Note that showDataset can display that eml w/o removing the odd characters. They just appear as weird characters.

ie http://mcr-dev.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-knz.2.3 where mcr-dev currently points to lava but that is likely to change.

mbjones commented 7 years ago

Original Redmine Comment Author Name: gastil gastil (gastil gastil) Original Date: 2012-06-05T22:43:09Z


To determine whether this bug is important, I used the pathQuery results from 30 April 2012 of all LTER eml abstracts. Of the 6826 eml docs, 240 of them contain some odd characters.

cat resultSet | tr -d '\000-\011\013-\177' | grep -v "^$" odd_chars | wc -l 240

So this affects less than 4 percent of the LTER eml docs in the LTER Metacat.

mbjones commented 7 years ago

Original Redmine Comment Author Name: gastil gastil (gastil gastil) Original Date: 2012-06-06T18:38:53Z


To clarify this bug, see knb-lter-bug.4103.2 versus knb-lter-bug.4103.3 in demo2.

The only difference is the presence of higher-order ascii characters in the abstract. In revision 3 they are commented-out.

mbjones commented 7 years ago

Original Redmine Comment Author Name: gastil gastil (gastil gastil) Original Date: 2012-06-07T02:50:32Z


Another example of this is the difference between https://demo2.test.dataone.org/knb/metacat/knb-test-nrs.569.3/default which does display versus https://demo2.test.dataone.org/knb/metacat/knb-test-nrs.569.2/default which does not display, but instead fails with a "white-page".

The ONLY diff between revisions 2 and 3 is the pacakgeId and Bancroft's office vs Bancroft ’s office

in the abstract.

(revision 1 had denyFirst.)

mbjones commented 7 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2012-06-07T07:24:15Z


These are unfortunate errors, but are most likely do to copy-and-paste from a [MS Word] document into the metadata file. I've added better error reporting during the transform so there's now an indication that rendering was not possible when the character is encountered.

mbjones commented 7 years ago

Original Redmine Comment Author Name: Redmine Admin (Redmine Admin) Original Date: 2013-03-27T21:31:06Z


Original Bugzilla ID was 5618