Open techgique opened 6 years ago
I have identified the character causing the issue in oscys.caseid.0105: —
https://cdrhdev1.unl.edu/earlywashingtondc/cases/oscys.caseid.0105
That doesn't really help, since we can't remove all em dashes.
So, Idea #1: is there a way to change the html creation script to explicitly set the encoding to UTF-8? (maybe this would be useful? I'm not quite sure how to implement it https://gist.github.com/arpith20/4fcf7682a9154bc777dfcd2199edecf4)
If that does not work, idea #2 will be to re-encode special characters with html tags, but I am hoping not to have to do that.
Update: I checked the XSLT file creating the HTML for oscys (scripts/tei.p5) and I think it is setting encoding correctly:
<xsl:output method="xml" indent="no" encoding="UTF-8" omit-xml-declaration="yes"/>
The XML files also correctly set the encoding as UTF-8, though it is possible that the original file is using a non UTF-8 encoding of the em dash.
A little more investigation:
I opened the file the HTML is transformed from, and the encoding of the em dash looks like this:
summons—
I believe 8212 is the HTML encoding of the em dash, but it doesn't work if I change it to ߞ
or —
either. So, we're back to having to try one of the ideas above.
Should we just change all of them to two minuses? -- Any idea why it only seems to be a problem on the case files and not the documents?
I'm not sure if this is still an issue, but it would be good to find out.
@kacinash do you know if this is resolved or you have a workaround?
I don't know. I'm not sure how to replicate the process Greg did that got him the error.
I think we'd have to revert the change I added in https://github.com/CDRH/earlywashingtondc/commit/292e018c1a4ef549d3256c0f14399adee9b737af and review pages with the suspect characters
Rails started throwing the error:
incompatible character encodings: ASCII-8BIT and UTF-8
for some case file pages.E.g.
Karin removed some text from the generated HTML file and it would work again but haven't identified the offending character or anything yet.
Used a dirty fix from Stack Overflow (https://stackoverflow.com/a/9278713) in https://github.com/CDRH/earlywashingtondc/commit/292e018c1a4ef549d3256c0f14399adee9b737af