AtomGraph / LinkedDataHub

The low-code Knowledge Graph application platform. Apache license.
https://atomgraph.github.io/LinkedDataHub/
Apache License 2.0
483 stars 120 forks source link

Could not import large RDF file. #150

Closed FNakano closed 1 year ago

FNakano commented 1 year ago

Hello! I converted a CSV file using atomgraph CSV2RDF. It generated a 193M (.ttl) file. ... then I tried to import it following steps 1-4 of https://atomgraph.github.io/LinkedDataHub/linkeddatahub/docs/user-guide/import-data/import-rdf-data/. Clicking on the save button (step 5) popped an alert with null written in it.

Captura de tela de 2023-01-29 20-28-24

Some messages are written in the terminal running LinkedDataHub:

linkeddatahub_1     | 00:52:57,489 [http-nio-7070-exec-1] DEBUG ModelXSLTWriterBase:252 - RDF/XML bytes written: 1124
nginx_1             | 172.18.0.1 - - [29/Jan/2023:23:52:57 +0000] "GET /files/?forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject&createGraph=true HTTP/1.1" 200 47126 "https://localhost:4443/files/?mode=https%3A%2F%2Fw3id.org%2Fatomgraph%2Flinkeddatahub%23ContentMode" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
nginx_1             | 2023/01/29 23:53:28 [error] 10#10: *423 client intended to send too large body: 193566581 bytes, client: 172.18.0.1, server: localhost, request: "POST /service?mode=https%3A%2F%2Fw3id.org%2Fatomgraph%2Fclient%23EditMode&forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject HTTP/1.1", host: "localhost:4443", referrer: "https://localhost:4443/files/?forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject&createGraph=true"
nginx_1             | 172.18.0.1 - - [29/Jan/2023:23:53:28 +0000] "POST /service?mode=https%3A%2F%2Fw3id.org%2Fatomgraph%2Fclient%23EditMode&forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject HTTP/1.1" 413 183 "https://localhost:4443/files/?forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject&createGraph=true" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
namedgraph commented 1 year ago

Hi. There are configured limits for the request body size: https://atomgraph.github.io/LinkedDataHub/linkeddatahub/docs/reference/configuration/

193M triples is a lot of data. I would try to import it directly into the triplestore instead.

FNakano commented 1 year ago

One more piece of information...

Got the same message while trying to import a CSV file:

linkeddatahub_1     | 15:49:19,539 [http-nio-7070-exec-4] DEBUG BasedModelProvider:81 - RDF language used to read Model: Lang:RDF/XML
nginx_1             | 172.18.0.1 - - [30/Jan/2023:14:49:19 +0000] "GET /sparql?query=DESCRIBE%20%2A%20WHERE%20%7B%0A%20%20SELECT%20DISTINCT%20%3Fresource%20WHERE%20%7B%0A%20%20%20%20%7B%0A%20%20%20%20%20%20GRAPH%20%3Fgraph%20%7B%0A%20%20%20%20%20%20%20%20%3Fresource%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type%3E%20%3FType%3B%0A%20%20%20%20%20%20%20%20%20%20%28%28%28%28%28%28%28%28%28%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23label%3E%7C%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2Ftitle%3E%29%7C%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Ftitle%3E%29%7C%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fname%3E%29%7C%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FgivenName%3E%29%7C%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FfamilyName%3E%29%7C%3Chttp%3A%2F%2Frdfs.org%2Fsioc%2Fns%23name%3E%29%7C%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23prefLabel%3E%29%7C%3Chttp%3A%2F%2Fschema.org%2Fname%3E%29%7C%3Chttps%3A%2F%2Fschema.org%2Fname%3E%29%20%3Flabel.%0A%20%20%20%20%20%20%20%20FILTER%28ISURI%28%3Fresource%29%29%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%20%20UNION%0A%20%20%20%20%7B%0A%20%20%20%20%20%20%3Fresource%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type%3E%20%3FType%3B%0A%20%20%20%20%20%20%20%20%28%28%28%28%28%28%28%28%28%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23label%3E%7C%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2Ftitle%3E%29%7C%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Ftitle%3E%29%7C%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fname%3E%29%7C%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FgivenName%3E%29%7C%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FfamilyName%3E%29%7C%3Chttp%3A%2F%2Frdfs.org%2Fsioc%2Fns%23name%3E%29%7C%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23prefLabel%3E%29%7C%3Chttp%3A%2F%2Fschema.org%2Fname%3E%29%7C%3Chttps%3A%2F%2Fschema.org%2Fname%3E%29%20%3Flabel.%0A%20%20%20%20%20%20FILTER%28ISURI%28%3Fresource%29%29%0A%20%20%20%20%7D%0A%20%20%20%20FILTER%28REGEX%28%3Flabel%2C%20%22Im%22%2C%20%22iq%22%29%29%0A%20%20%20%20FILTER%28%3FType%20IN%28%3Chttps%3A%2F%2Fwww.w3.org%2Fns%2Fldt%2Fdocument-hierarchy%23Container%3E%2C%20%3Chttps%3A%2F%2Fwww.w3.org%2Fns%2Fldt%2Fdocument-hierarchy%23Container%3E%29%29%0A%20%20%7D%0A%7D HTTP/1.1" 200 2893 "https://localhost:4443/84f2dfb7-3c89-4405-abec-750e99c3e9c2/?forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject&createGraph=true" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"
nginx_1             | 2023/01/30 14:49:34 [error] 10#10: *124 client intended to send too large body: 10374282 bytes, client: 172.18.0.1, server: localhost, request: "POST /service?mode=https%3A%2F%2Fw3id.org%2Fatomgraph%2Fclient%23EditMode&forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject HTTP/1.1", host: "localhost:4443", referrer: "https://localhost:4443/84f2dfb7-3c89-4405-abec-750e99c3e9c2/?forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject&createGraph=true"
nginx_1             | 172.18.0.1 - - [30/Jan/2023:14:49:34 +0000] "POST /service?mode=https%3A%2F%2Fw3id.org%2Fatomgraph%2Fclient%23EditMode&forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject HTTP/1.1" 413 183 "https://localhost:4443/84f2dfb7-3c89-4405-abec-750e99c3e9c2/?forClass=http%3A%2F%2Fwww.semanticdesktop.org%2Fontologies%2F2007%2F03%2F22%2Fnfo%23FileDataObject&createGraph=true" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/109.0"

Captura de tela de 2023-01-30 11-53-53

FNakano commented 1 year ago

Thanks for your reply.

Hi. There are configured limits for the request body size: https://atomgraph.github.io/LinkedDataHub/linkeddatahub/docs/reference/configuration/

193M triples is a lot of data. I would try to import it directly into the triplestore instead.

How can I import directly to the triplestore?

namedgraph commented 1 year ago

For example, using the SPARQL Graph Store Protocol: https://jena.apache.org/documentation/fuseki2/soh.html#soh-sparql-http

Map fuseki-end-user/fuseki-admin service ports to the host as shown here: https://github.com/AtomGraph/LinkedDataHub/blob/master/docker-compose.debug.yml

Then end-user and admin Fuseki endpoints will be available on http://localhost:3031/ds and http://localhost:3030/ds, respectively.

FNakano commented 1 year ago

It worked... I think... a select count()... sparql query on the imported graph reported almost 1,5M triples.

SELECT COUNT(?s)
WHERE
{
    GRAPH <http://example/lab8>
    { ?s ?p ?o }
}

Steps and Success Indicators

Map fuseki-end-user/fuseki-admin service ports to the host as shown here: https://github.com/AtomGraph/LinkedDataHub/blob/master/docker-compose.debug.yml

Inserted

  fuseki-admin:
    ports:
      - 3030:3030
  fuseki-end-user:
    ports:
      - 3031:3030

into LinkedDataHub docker-compose.yml and restarted it with docker-compose up --buld

Indicator: browsed http://localhost:3031/ds and http://localhost:3030/ds. Firefox downloaded ds.trig files, one containing user data, other containing admin data.

For example, using the SPARQL Graph Store Protocol: https://jena.apache.org/documentation/fuseki2/soh.html#soh-sparql-http

Downloaded (currently) latest Fuseki binary: apache-jena-fuseki-4.7.0.zip unzipped it. SOH executables are inside apache-jena-fuseki-4.7.0/bin folder.

Then end-user and admin Fuseki endpoints will be available on http://localhost:3031/ds and http://localhost:3030/ds, respectively.

Inserted data by running ./s-put http://localhost:3031/ds http://example/lab8 ~/MeuGithub/CSV2RDF/example/lab8.ttl

Indicator: use LinkedDataHub SPARQL Editor to runthe SPARQL query at the top of this post. Evaluate if inserted triples count is fine.