cayleygraph / cayley

An open-source graph database
https://cayley.io
Apache License 2.0
14.86k stars 1.25k forks source link

Potential escaping problem #140

Closed barakmich closed 6 years ago

barakmich commented 10 years ago

It's possible that this is WAI, and I'm AFK to really check, but got this error loading a chunk of Freebase: 00:14:51.647739 03199 cayley.go:190] failed to parse "ns:m.0y_chx\tns:music.recording.lyrics_website..common.webpage.uri\t<http://www.metrolyrics.com/?\"-lyrics-stephen-sondheim.html>.": invalid N-Quad: unexpected rune '"' at 95

There were some breakages in the RDF dump a few months ago so I'm not sure this is guaranteed valid (and therefore, not Cayley's fault) but it's worth mentioning.

kortschak commented 10 years ago

"ns:m.0y_chx\tns:music.recording.lyrics_website..common.webpage.uri\thttp://www.metrolyrics.com/?"-lyrics-stephen-sondheim.html."

That looks broken to me - there should not be a " literal in a URL (http://www.w3.org/TR/n-quads/#grammar-production-IRIREF). Can you try parsing it with nquads? That would be a definitive answer.

However, I think this is certainly working as intended.

The fix would be to \uxxxx or %xx encode the double quote in the data if it's real - which itself seems doubtful.

kortschak commented 10 years ago

Interestingly nquads does parse it - I will send a fix, I left " in the definition of IRIREF where I should not have.