earthcubearchitecture-project418 / p418Docs

Documentation on Project 418. Includes publishing guideline, example JSON-LD, and presentations about the project.
7 stars 2 forks source link

Sites to Index #10

Closed ashepherd closed 6 years ago

ashepherd commented 6 years ago
fils commented 6 years ago

Unicode issue:

So the RDF generates fine but a quick check with rapper results in

rapper: Error - URI file:///Users/dfils/Desktop/IEDArun/nquads.rdf:3067 column 116 - Non-printable ASCII character 195 (0xC3) found

due to

< DOI:10.1594/IEDA/100631> <http://schema.org/citation> "Tyrrell, James P., 
Bidgoli, Tandis S, Walker, J Douglas, Möller, Andreas (2016), Conodont (U-Th)/He 
thermochronology of the Mormon Mountains, Tule Spring Hills, and Beaver 
Dam Mountains, southeastern Nevada and southwestern Utah: Appendix B: 
All conodont LA-ICPMS depth profiles. Integrated Earth Data Applications 
(IEDA). doi:10.1594/IEDA/100631" .

So obviously the Moller has a non-ascii character in it.
It does suck to be honest for people with names like this or terms from non-ASCII origin.

Have you had to deal with this? I have but I did it by doing the classic ö -> o translation and so on. There is routine in core Go for doing this unicode conversion so it's real slick but I do feel bad converting unicode names to ASCII.

I'm trying to resolve if we need to make this a "policy / practice" to do this translation or if we can deal with this on our end.

The RDF concepts document [1] indicates both plain and typed literals be in Unicode Normal Form C [NFC] [2]. If I read that correct.. it seems the characters should be fine and rapper is at fault for returning them.

comments/interpretations ?

[1] https://www.w3.org/TR/rdf-concepts/ [2] http://www.unicode.org/reports/tr15/ [3] https://en.wikipedia.org/wiki/Unicode_equivalence

@ashepherd @smrgeoinfo

fils commented 6 years ago

@smrgeoinfo

Steve, have a couple mods to the JSON-LD need to bring up with you... note the

< DOI:10.1594/IEDA/323582> <http://schema.org/keywords> "Navigation:Primary" .

Note the space before the DOI... string in the URI. I assume you didn't want those there and they got in via the code to generate these? I suspect you want those gone..

Also..

The resources in the snippet at the end have errors. I think you have a returns in the keywords line that need to be removed. Check http://tinyurl.com/y9zxpzqv to see the error and simply remove the returns that wrap the line to get back to valid JSON (or escape the new lines)


2018/01/24 14:59:44 Error when transforming JSON-LD document to interface: invalid character '\n' in string literal
2018/01/24 14:59:44 ERROR: At http://get.iedadata.org/doi/100428 JSON-LD is NOT valid: invalid character '\n' in string literal
2018/01/24 14:59:44 URL http://get.iedadata.org/doi/100428 has error: invalid character '\n' in string literal
2018/01/24 15:25:43 Error when transforming JSON-LD document to interface: invalid character '\n' in string literal
2018/01/24 15:25:43 ERROR: At http://get.iedadata.org/doi/500067 JSON-LD is NOT valid: invalid character '\n' in string literal
2018/01/24 15:25:43 URL http://get.iedadata.org/doi/500067 has error: invalid character '\n' in string literal
2018/01/24 15:37:39 Error when transforming JSON-LD document to interface: invalid character '\n' in string literal
2018/01/24 15:37:39 ERROR: At http://get.iedadata.org/doi/500113 JSON-LD is NOT valid: invalid character '\n' in string literal
2018/01/24 15:37:39 URL http://get.iedadata.org/doi/500113 has error: invalid character '\n' in string literal
2018/01/24 17:57:54 Error when transforming JSON-LD document to interface: invalid character '\n' in string literal
2018/01/24 17:57:54 ERROR: At http://get.iedadata.org/doi/321838 JSON-LD is NOT valid: invalid character '\n' in string literal
2018/01/24 17:57:54 URL http://get.iedadata.org/doi/321838 has error: invalid character '\n' in string literal```
smrgeoinfo commented 6 years ago

RE the non-printing character, something is odd in rappers message. 195 ( 0xC3 ) iis the à character, ö is 246 (0xF6) according to http://www.idevelopment.info/data/Programming/programming_resources/PROGRAMMING_ascii_table.shtml

But that's kind of beside the point because it sounds like the tools should be OK with Unicode characters, and they should be. Lets see if it causes any other problems.

fils commented 6 years ago

Correct.. I am not really worried about the UNICODE.. there might just be some methods we need to put in place to ensure all the tools know about them.

The space in the URI and new line in the JSON are more breaking issues... the unicode just a point to resolve/document.

ashepherd commented 6 years ago

Tracking sitemaps elsewhere