american-art / npg

National Portrait Gallery
Creative Commons Zero v1.0 Universal
1 stars 6 forks source link

Problem of double quotes in model #58

Closed GreatYYX closed 7 years ago

GreatYYX commented 7 years ago

Double quotes haven't been handled in URI of NPGConstituents2, NPGConAltNames2 and NPGObjTitles2.

Example: In NPGObjTitles2, James T. Powers in "Two Little Brides" is finally generated as <http://data.americanartcollaborative.org/npg/thesauri/title/james_t._powers_in_"two_little_brides"> <http://www.w3.org/2000/01/rdf-schema#label> "James T. Powers in \"Two Little Brides\"", which causes an error: Error 500: Parse error: [line: 199, col: 82] Illegal character in IRI (codepoint 0x22, '"'): <http://data.americanartcollaborative.org/npg/thesauri/title/james_t._powers_in_["]...>.

workergnome commented 7 years ago

If you're thinking through this, we should probably also think about other non-URI friendly characters, like "é" or "?".

See https://perishablepress.com/stop-using-unsafe-characters-in-urls/ for more information.

GreatYYX commented 7 years ago

@workergnome Absolutely! it seems we need to implement a filter for uri. I don't know if Karma's built-in python interpreter supports 'import', if so, regular expression can be used here.

import re
raw_str = 'james_t._powers_in_"two_little_brides"'
str = re.sub('[^0-9A-Za-z$-_.+!*(),]', '', raw_str)
workergnome commented 7 years ago

Have you looked at something like https://pypi.python.org/pypi/slugger/?

szeke commented 7 years ago

The model should use SM.uri_from_fields("some prefix", getValue("something"))

I just committed the py transform libraries from dig to aac-alignment

rhao commented 7 years ago

Note: should be UM.uri_from_fields, but otherwise this works.