digitalbazaar / jsonld.js

A JSON-LD Processor and API implementation in JavaScript
https://json-ld.org/
Other
1.65k stars 195 forks source link

toRDF ignores terms with spaces #470

Closed tpluscode closed 2 years ago

tpluscode commented 2 years ago

I found that jsonld does not parse URIs with spaces. For example, this will return an empty result

import jsonld from "jsonld"

await jsonld.promises.toRDF({
  "@id": "http://example.com/foo bar",
  "@type": "http://example.com/Resource"
})

I worked fine until v2. I tracked it to this place in code. The isAbsolute function is too strict and will reject identifiers which are not fully escaped. https://github.com/digitalbazaar/jsonld.js/blob/20e2286d198ce7d376320306f9df3667f48a4544/lib/toRdf.js#L108-L111

I find this was explicitly changed in #354 to "Do minimal checking to see if IRIs are valid by making sure they contain no whitespace." Which means that the function no longer only checks whether a URI is absolute or not...

This is problematic because since JavaScript does not have a full-blown URI representation, it is common to come across unescaped URIs, and they have to be taken at face value.

Otherwise, if jsonld wishes to normalise the URIs, the maybe encodeURI would have to be called to ensure 100% correct URIs in the parsed output?

dmitrizagidulin commented 2 years ago

This is problematic because since JavaScript does not have a full-blown URI representation

@tpluscode - can you say more about that? Most versions of javascript (Node.js, all current browsers, etc) implement the WHATWG URL standard, no?

tpluscode commented 2 years ago

My mistake, sorry. What I should have said is that the RDF/JS spec does not use the URL class. URIs are represented merely as strings. Finally, same goes for anyone trying to write JSON-LD by hand.

{
  "@context": { "@vocab": "http://schema.org" },
  "@id": "https://example.com/badly formed URL",
  "@type": "schema:WebPage"
}

The above will work in JSON-LD Playground but not using the library.

davidlehn commented 2 years ago

The above will work in JSON-LD Playground but not using the library.

The playground uses 3.2.0, so a bit behind the latest release. Probably doesn't matter for this use case. It's getting more difficult to keep the playground up-to-date, but I can't recall if there was a specific reason I didn't update it recently.

What do you mean the above "will work"? Looks like for the tabs that go to triples it just drops everything. Unfortunately, that's not the best feedback. What was the expectation?

This is problematic because since JavaScript does not have a full-blown URI representation, it is common to come across unescaped URIs, and they have to be taken at face value.

URIs can't contain spaces, so the handling of "unescaped URIs" is probably best considered undefined here. I'm not sure what heuristics would be needed to guess if inputs needed to be escaped. That's likely only possible at the application level. And for performance reasons it makes more sense to have fast paths that assume valid input. In this case, it's not incorrect to have isAbsoluteURI check for spaces. If it has spaces it's not a URI, so also not an absolute URI. I imagine the logic and comments could be clearer. URIs are not checked for validity everywhere because that would be slow. It does lead to inconsistent behavior where some places that expect URIs will just pass through strings with spaces. In this case I guess it made sense to ensure spaces didn't end up in output triples, and there were some tests for that.

Also, jsonld.promises.* is for backwards compatibility now. You can just use the regular top-level functions.

tpluscode commented 2 years ago

Right, the RDF tab will come up empty. The JSON-LD algorithms on the other hand preserve the space when going between compacted, expanded, etc

Uh, ok, I'm not so sure what to think about this anymore. Let me close for now and I will reopen if this ever hits me again and I will feel like it can't be fixed outside of jsonld