codecentric / web-clip

A Chrome extension to extract structured data from any web page and store it to a Solid Pod.
MIT License
11 stars 3 forks source link

400 on Patch when clippling https://www.forkknifeswoon.com/simple-homemade-chicken-ramen/ #7

Closed angelo-v closed 3 years ago

angelo-v commented 3 years ago

Trying to clip from https://www.forkknifeswoon.com/simple-homemade-chicken-ramen/ results in:

Patch document syntax error: Line 1 of <https://angelo.veltens.org/webclip/2021/10/11/55a6d21e-2aa8-44ca-bff9-457dc2b00195>: Bad syntax:
   Unknown syntax at start of statememt: '[object Object]'
   at: "[object Object]"
angelo-v commented 3 years ago

The page in question contains JSON-LD literals with \r\n\r\n characters, which I guess is fine, but somehow the rdflib updater messes thisi up to an invalid SPARQL Update command including real line breaks in literal values. This needs to be fixed in rdflib.

Besides that the page data is really huge, there might be other issues that need investigation after the rdflib fix

angelo-v commented 3 years ago

rdflib issue: https://github.com/linkeddata/rdflib.js/issues/517

angelo-v commented 3 years ago

after removing the carriage returns I could successfully clip the page data to a locally running instance of CSS. Clipping to NSS failed, but that might be because NSS is just not capable of receiving a sparql insert with > 10000 statements

angelo-v commented 3 years ago

I upgraded rdflib, on NSS it still does not work (as expected, see above). Needs to be tested on CSS, e.g. https://solidweb.me, which is currently down for maintainance

angelo-v commented 3 years ago

Successfully tested on CSS at https://solidweb.me

angelo-v commented 3 years ago

Possible follow up: If NSS sent 413 we could split large requests into multiple ones as needed. https://github.com/solid/node-solid-server/issues/1628