RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.17k stars 555 forks source link

turtle serialization breaks on import #336

Closed satra closed 10 years ago

satra commented 11 years ago

i'm serializing a graph with about 16000 statements. the graph serializes to turtle fine, but doesn't read the output back in properly

In [3]: rdfgraph.serialize('outfile.ttl', format='turtle')

In [5]: g = rdflib.Graph().parse('outfile.ttl', format='turtle')
  File "<string>", line unknown
BadSyntax

no error happens if i serialize to rdf+xml or nt.

joernhees commented 11 years ago

sorry for the late reply, but you could've realized that it's hard to debug this with more info... please provide the files of the graph serialized as turtle and xml and your rdflib version

theoryno3 commented 11 years ago

What are the contents of your graph? Could you provide more verbose information about its construction? Cheers.

satra commented 10 years ago

the graph is an automated construction from a workflow program, so it's really large. here is an example:

https://dl.dropbox.com/s/an6qyhf9yfvn634/outfile.ttl

but the key here is the roundtrip. i use rdflib to create this graph, but it won't read it back in.

ghost commented 10 years ago

Thanks for that - it was absolutely vital.

Confirmed as a bug in the RDFLib notation3 parser. For a workaround, switch temporarily to using the rdf/xml format for transport.

Despite the misleading error message, it's an issue with the notation3 parser's handling of the namespace localpart, to wit.:

import rdflib

problemlocalname = '''\
@prefix fs: <http://freesurfer.net/fswiki/terms/> .
@prefix prov: <http://www.w3.org/ns/prov#> .

<http://nidm.nidash.org/iri/82b79326488911e3b2fb14109fcf6ae7> a fs:stat_header,
        prov:Entity ;
    fs:mrisurf.c-cvs_version "$Id: mrisurf.c,v .. abridged .. Exp $" .
'''

FS = rdflib.Namespace('http://freesurfer.net/fswiki/terms/')
g = rdflib.Graph()
g.bind('fs', str(FS))
g.add((
    rdflib.URIRef('http://example.org'),
    FS['mrisurf.c-cvs_version'],
    rdflib.Literal("irrelevant")))
turtledump = g.serialize(format="turtle").decode('utf-8')
g1 = rdflib.Graph()
g1.parse(data=turtledump, format="turtle")

Looks like the problem could be in the code for notation3.SinkParser.path

https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/notation3.py#L719

Now, if I only knew what a "path production" was ....