CDLUC3 / dash

General repository for documents and communication for UC Dash project.
http://cdluc3.github.io/dash
MIT License
11 stars 4 forks source link

Apostrophes not encoded correctly in Abstract field #57

Open ghost opened 10 years ago

ghost commented 10 years ago

Apostrophes appear fine in the metadata submission screen, but render as HTML in User Interface. Example: https://dash.lib.uci.edu/xtf/view?docId=uci/ark%2B%3Db7280%3Dd1mw2m/mrt-datacite.xml;query=

marisastrong commented 10 years ago

can we provide a screenshot of this issue? I'm not seeing a problem on page of the link provided.

marisastrong commented 10 years ago

https://www.pivotaltracker.com/story/show/81880656

ghost commented 10 years ago

Sure thing! dashencodingbugscreenshot I've attached a screenshot with a red circle highlighting the issue. This is just for one object, but HTML encoding appears in place of both apostrophes and open/close quote marks (and possibly other punctuation) within other UCI Dash objects' abstract fields.

marisastrong commented 10 years ago

This is helpful I'll look to see how we're storing in the database

cpwillett commented 10 years ago

Matthew, did you type this in from scratch, or did you cut-and-paste from another document?

ghost commented 10 years ago

Perry-

I copied and pasted from a Word doc. I didn't create this Doc but, judging from appearance, it looks like a student copied and pasted HTML from the data's DataVerse page: http://thedata.harvard.edu/dvn/dv/ssda_uci/faces/study/StudyPage.xhtml?studyId=46092&versionNumber=3

I'd attach the actual word doc but Github only allows images. If you'd like I can email it.

Now that you've pointed it out, it's very likely that this process (Webpage->C/P->Word->C/P->Dash Interface) is where the encoding translation issue is happening.

That said, I anticipate most researchers will be copying and pasting text into Dash, so if it can't be addressed via the Dash code it'd be nice to offer some guidance on text preparation and/or recommended file formats for copying/pasting from.

marisastrong commented 10 years ago

We've seen similar issues in other applications when cutting and pasting from Word Documents.

We will need to add functionality to handle performing C/P from Word. We can apply some utility to clean up the encoding.

cpwillett commented 10 years ago

I don't know if we want to support this, but we have another case where someone cut-and-paste from LaTex: https://dash.ucop.edu/xtf/view?docId=ucop/ark%2B%3Db5060%3Dd8rp4v/mrt-datacite.xml