RDFLib / rdflib-jsonld

JSON-LD parser and serializer plugins for RDFLib
Other
280 stars 71 forks source link

Graph formatted as json-ld contains all default namespaces as @context #103

Open marrog opened 3 years ago

marrog commented 3 years ago

Hello there!

Im trying to serialize some RDF graph generated from CSV file by using of package: https://pypi.org/project/csvwlib/ Passing None as format to CSVWConverter.to_rdf() method returns normal instance of rdflib.graph.Graph class.

CSV file content is:

John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123

Precisely, what I'm doing:

from csvwlib import CSVWConverter

graph = CSVWConverter.to_rdf("http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv", format=None))
for s, p, o in graph:
    print(s, p, o)
data = graph.serialize(format='json-ld', auto_compact=True).decode()
print(jsonld_data)

Printed data is:

Ne3e83b17e4114c78832adf7bd75a9b36 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson Terrace
N6ef1e15238d24e8fb88602c57f8b8ca3 http://www.w3.org/ns/csvw#url http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=6
Nd2ccc146fd974872a7bc693dae87fd0e http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson hobo
N34dafb323d3f4456b96930cf381237e8 http://www.w3.org/ns/csvw#row N1eafebf69a1047179d173bf58bf23c05
N501be14e342844709b3a83b54954b9e7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120 Joan "the
Nc36f727d837c4719a7367f352bebcd01 http://www.w3.org/ns/csvw#rownum 1
N3b32ad9ff0f947769a0655414355504e http://www.w3.org/ns/csvw#rownum 3
N1eafebf69a1047179d173bf58bf23c05 http://www.w3.org/ns/csvw#url http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=3
N7d3b00fcaf54447389515e28327302a7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C St.,Riverside,
Ne3e83b17e4114c78832adf7bd75a9b36 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120 Stephen,Tyler,"7452
N7d3b00fcaf54447389515e28327302a7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C Jefferson
N6e47682f376b4f2dab747f7af937e2f6 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson SD,
Ne3e83b17e4114c78832adf7bd75a9b36 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C At
N34dafb323d3f4456b96930cf381237e8 http://www.w3.org/ns/csvw#row N63097e6edb6149a5ae87519a89d3f446
N765a85d57b124451a3868d27d85a36c5 http://www.w3.org/ns/csvw#table N34dafb323d3f4456b96930cf381237e8
Nd2ccc146fd974872a7bc693dae87fd0e http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120 Jack,McGinnis,220
Nd2ccc146fd974872a7bc693dae87fd0e http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C Av.,Phila,
N6ef1e15238d24e8fb88602c57f8b8ca3 http://www.w3.org/ns/csvw#describes N501be14e342844709b3a83b54954b9e7
N34dafb323d3f4456b96930cf381237e8 http://www.w3.org/ns/csvw#row N3b32ad9ff0f947769a0655414355504e
Nd2ccc146fd974872a7bc693dae87fd0e http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C PA,09119
N3b32ad9ff0f947769a0655414355504e http://www.w3.org/ns/csvw#describes Ne3e83b17e4114c78832adf7bd75a9b36
Ne3e83b17e4114c78832adf7bd75a9b36 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C the
Ne3e83b17e4114c78832adf7bd75a9b36 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#08075 Plaza""
Nc36f727d837c4719a7367f352bebcd01 http://www.w3.org/ns/csvw#describes Nd2ccc146fd974872a7bc693dae87fd0e
N3b32ad9ff0f947769a0655414355504e http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/csvw#Row
N6ef1e15238d24e8fb88602c57f8b8ca3 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/csvw#Row
N501be14e342844709b3a83b54954b9e7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C at
N765a85d57b124451a3868d27d85a36c5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/csvw#TableGroup
N501be14e342844709b3a83b54954b9e7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson bone"",
N1eafebf69a1047179d173bf58bf23c05 http://www.w3.org/ns/csvw#rownum 2
Nc36f727d837c4719a7367f352bebcd01 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/csvw#Row
N3b32ad9ff0f947769a0655414355504e http://www.w3.org/ns/csvw#url http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=4
N1eafebf69a1047179d173bf58bf23c05 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/csvw#Row
Nc36f727d837c4719a7367f352bebcd01 http://www.w3.org/ns/csvw#url http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=2
N34dafb323d3f4456b96930cf381237e8 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/csvw#Table
N7d3b00fcaf54447389515e28327302a7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson Man""",Repici,120
N501be14e342844709b3a83b54954b9e7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C Anne",Jet,"9th,
N63097e6edb6149a5ae87519a89d3f446 http://www.w3.org/ns/csvw#rownum 4
N1eafebf69a1047179d173bf58bf23c05 http://www.w3.org/ns/csvw#describes N7d3b00fcaf54447389515e28327302a7
N6e47682f376b4f2dab747f7af937e2f6 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120 ,Blankman,,SomeTown,
N34dafb323d3f4456b96930cf381237e8 http://www.w3.org/ns/csvw#url http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv
N7d3b00fcaf54447389515e28327302a7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#08075 NJ,08075
N63097e6edb6149a5ae87519a89d3f446 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/csvw#Row
N63097e6edb6149a5ae87519a89d3f446 http://www.w3.org/ns/csvw#describes N6e47682f376b4f2dab747f7af937e2f6
N34dafb323d3f4456b96930cf381237e8 http://www.w3.org/ns/csvw#row N6ef1e15238d24e8fb88602c57f8b8ca3
N63097e6edb6149a5ae87519a89d3f446 http://www.w3.org/ns/csvw#url http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=5
N34dafb323d3f4456b96930cf381237e8 http://www.w3.org/ns/csvw#row Nc36f727d837c4719a7367f352bebcd01
N7d3b00fcaf54447389515e28327302a7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120 John "Da
N501be14e342844709b3a83b54954b9e7 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#08075 Terrace
N6ef1e15238d24e8fb88602c57f8b8ca3 http://www.w3.org/ns/csvw#rownum 5
N6e47682f376b4f2dab747f7af937e2f6 http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C 00298
{
  "@context": {
    "as": "https://www.w3.org/ns/activitystreams#",
    "cc": "http://creativecommons.org/ns#",
    "csvw": "http://www.w3.org/ns/csvw#",
    "ctag": "http://commontag.org/ns#",
    "dc": "http://purl.org/dc/terms/",
    "dc11": "http://purl.org/dc/elements/1.1/",
    "dcat": "http://www.w3.org/ns/dcat#",
    "dcterms": "http://purl.org/dc/terms/",
    "dqv": "http://www.w3.org/ns/dqv#",
    "duv": "https://www.w3.org/TR/vocab-duv#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "gr": "http://purl.org/goodrelations/v1#",
    "grddl": "http://www.w3.org/2003/g/data-view#",
    "ical": "http://www.w3.org/2002/12/cal/icaltzd#",
    "ldp": "http://www.w3.org/ns/ldp#",
    "ma": "http://www.w3.org/ns/ma-ont#",
    "oa": "http://www.w3.org/ns/oa#",
    "og": "http://ogp.me/ns#",
    "org": "http://www.w3.org/ns/org#",
    "owl": "http://www.w3.org/2002/07/owl#",
    "prov": "http://www.w3.org/ns/prov#",
    "qb": "http://purl.org/linked-data/cube#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfa": "http://www.w3.org/ns/rdfa#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "rev": "http://purl.org/stuff/rev#",
    "rif": "http://www.w3.org/2007/rif#",
    "rr": "http://www.w3.org/ns/r2rml#",
    "schema": "http://schema.org/",
    "sd": "http://www.w3.org/ns/sparql-service-description#",
    "sioc": "http://rdfs.org/sioc/ns#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "sosa": "http://www.w3.org/ns/sosa/",
    "ssn": "http://www.w3.org/ns/ssn/",
    "time": "http://www.w3.org/2006/time#",
    "v": "http://rdf.data-vocabulary.org/#",
    "vcard": "http://www.w3.org/2006/vcard/ns#",
    "void": "http://rdfs.org/ns/void#",
    "wdr": "http://www.w3.org/2007/05/powder#",
    "wdrs": "http://www.w3.org/2007/05/powder-s#",
    "xhv": "http://www.w3.org/1999/xhtml/vocab#",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@graph": [
    {
      "@id": "_:N765a85d57b124451a3868d27d85a36c5",
      "@type": "csvw:TableGroup",
      "csvw:table": {
        "@id": "_:N34dafb323d3f4456b96930cf381237e8"
      }
    },
    {
      "@id": "_:N34dafb323d3f4456b96930cf381237e8",
      "@type": "csvw:Table",
      "csvw:row": [
        {
          "@id": "_:N63097e6edb6149a5ae87519a89d3f446"
        },
        {
          "@id": "_:N1eafebf69a1047179d173bf58bf23c05"
        },
        {
          "@id": "_:Nc36f727d837c4719a7367f352bebcd01"
        },
        {
          "@id": "_:N3b32ad9ff0f947769a0655414355504e"
        },
        {
          "@id": "_:N6ef1e15238d24e8fb88602c57f8b8ca3"
        }
      ],
      "csvw:url": {
        "@id": "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv"
      }
    },
    {
      "@id": "_:N63097e6edb6149a5ae87519a89d3f446",
      "@type": "csvw:Row",
      "csvw:describes": {
        "@id": "_:N6e47682f376b4f2dab747f7af937e2f6"
      },
      "csvw:rownum": 4,
      "csvw:url": {
        "@id": "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=5"
      }
    },
    {
      "@id": "_:N6e47682f376b4f2dab747f7af937e2f6",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120": ",Blankman,,SomeTown,",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson": "SD,",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C": "00298"
    },
    {
      "@id": "_:N1eafebf69a1047179d173bf58bf23c05",
      "@type": "csvw:Row",
      "csvw:describes": {
        "@id": "_:N7d3b00fcaf54447389515e28327302a7"
      },
      "csvw:rownum": 2,
      "csvw:url": {
        "@id": "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=3"
      }
    },
    {
      "@id": "_:N7d3b00fcaf54447389515e28327302a7",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#08075": "NJ,08075",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120": "John \"Da",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C": "St.,Riverside,",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson": "Man\"\"\",Repici,120",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C": "Jefferson"
    },
    {
      "@id": "_:Nc36f727d837c4719a7367f352bebcd01",
      "@type": "csvw:Row",
      "csvw:describes": {
        "@id": "_:Nd2ccc146fd974872a7bc693dae87fd0e"
      },
      "csvw:rownum": 1,
      "csvw:url": {
        "@id": "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=2"
      }
    },
    {
      "@id": "_:Nd2ccc146fd974872a7bc693dae87fd0e",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120": "Jack,McGinnis,220",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C": "PA,09119",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson": "hobo",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C": "Av.,Phila,"
    },
    {
      "@id": "_:N3b32ad9ff0f947769a0655414355504e",
      "@type": "csvw:Row",
      "csvw:describes": {
        "@id": "_:Ne3e83b17e4114c78832adf7bd75a9b36"
      },
      "csvw:rownum": 3,
      "csvw:url": {
        "@id": "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=4"
      }
    },
    {
      "@id": "_:Ne3e83b17e4114c78832adf7bd75a9b36",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#08075": "Plaza\"\"",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120": "Stephen,Tyler,\"7452",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C": "the",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson": "Terrace",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C": "At"
    },
    {
      "@id": "_:N6ef1e15238d24e8fb88602c57f8b8ca3",
      "@type": "csvw:Row",
      "csvw:describes": {
        "@id": "_:N501be14e342844709b3a83b54954b9e7"
      },
      "csvw:rownum": 5,
      "csvw:url": {
        "@id": "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#row=6"
      }
    },
    {
      "@id": "_:N501be14e342844709b3a83b54954b9e7",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#08075": "Terrace",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#John%2CDoe%2C120": "Joan \"the",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#NJ%2C": "at",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#jefferson": "bone\"\",",
      "http://localhost:8000/media/resources/20210630/tmpsn0l02m9.utf8_encoded.csv#st.%2CRiverside%2C": "Anne\",Jet,\"9th,"
    }
  ]
}

My question is:

Why jsonld_data contains all default namespaces as @context even if only one of them (csvw) occurs inside @graph section? Using any of "default" formats as n3 or ttl, returns completely different namespaces:

data = graph.serialize(format='n3').decode()
print(data)

Result:

@prefix : <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix ns1: <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#st.%2CRiverside%> .
@prefix ns2: <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#NJ%> .
@prefix ns3: <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#John%2CDoe%> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[] a csvw:TableGroup ;
    csvw:table [ a csvw:Table ;
            csvw:row [ a csvw:Row ;
                    csvw:describes [ :08075 "NJ,08075" ;
                            ns3:2C120 "John \"Da" ;
                            ns2:2C "St.,Riverside," ;
                            :jefferson "Man\"\"\",Repici,120" ;
                            ns1:2C "Jefferson" ] ;
                    csvw:rownum 2 ;
                    csvw:url <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#row=3> ],
                [ a csvw:Row ;
                    csvw:describes [ ns3:2C120 "Jack,McGinnis,220" ;
                            ns2:2C "PA,09119" ;
                            :jefferson "hobo" ;
                            ns1:2C "Av.,Phila," ] ;
                    csvw:rownum 1 ;
                    csvw:url <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#row=2> ],
                [ a csvw:Row ;
                    csvw:describes [ :08075 "Plaza\"\"" ;
                            ns3:2C120 "Stephen,Tyler,\"7452" ;
                            ns2:2C "the" ;
                            :jefferson "Terrace" ;
                            ns1:2C "At" ] ;
                    csvw:rownum 3 ;
                    csvw:url <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#row=4> ],
                [ a csvw:Row ;
                    csvw:describes [ :08075 "Terrace" ;
                            ns3:2C120 "Joan \"the" ;
                            ns2:2C "at" ;
                            :jefferson "bone\"\"," ;
                            ns1:2C "Anne\",Jet,\"9th," ] ;
                    csvw:rownum 5 ;
                    csvw:url <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#row=6> ],
                [ a csvw:Row ;
                    csvw:describes [ ns3:2C120 ",Blankman,,SomeTown," ;
                            :jefferson "SD," ;
                            ns1:2C "00298" ] ;
                    csvw:rownum 4 ;
                    csvw:url <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv#row=5> ] ;
            csvw:url <http://localhost:8000/media/resources/20210630/tmpks_36979.utf8_encoded.csv> ] .

Versions of packages:

csvwlib==0.3.2 rdflib==5.0.0 rdflib-jsonld==0.5.0

Please let me know if described behavior is a potential error or not. Is there any way to get the same namespaces in context of json-ld as for n3?

nicholascar commented 3 years ago

Hi @marrog, I can confirm this is a real issue.

It occurs because csvwlib binds a whole pile of prefixes & namespaces to the rdflib Graph() (see https://github.com/DerwenAI/csvwlib/blob/master/csvwlib/utils/rdf/Namespaces.py#L13) and then the rdflib-jsonld serializer puts everything in the graph's namespaces list into the context (see https://github.com/RDFLib/rdflib-jsonld/blob/master/rdflib_jsonld/serializer.py#L116). This only occurs for JSON-LD and not any other "default" RDF format because the serializers are completely independent.

A real solution here would be to introduce a checker into the rdflib-jsonld code to ensure only used namespaces are placed into Context. Another solution would be to do a similar thing in csvwlib i.d. only bind used prefixes to the graph.

While I think either or both could be done, I would have to check if it's expected JSON-LD behaviour: perhaps there is no requirement that context be limited in this way - to used context elements only - and there may even be reasons why it's not desired, but I suspect that it is not desired behaviour.

Sorry I don't know already, I'm pretty new to JSON-LD as I'm actually one of the rdflib maintainers and have sort of inherited rdflib-jsonld.

I'm not going to be able to fix this any time soon - I'm preoccupied with merging rdflib-jsonld into rdflib main library and adding JSON-LD 1.1 support - so I'd love any external contributions for this.