Open jnehring opened 8 years ago
I fixed a bug and now the problems I reported in this issue about examining the sesame are fixed. Examining the e-Sesame again I get a lot of output like this:
<result>
<binding name='p'>
<uri>http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#isString</uri>
</binding>
<binding name='s'>
<uri>http://dkt.dfki.de/documents/#char=0,764</uri>
</binding>
<binding name='o'>
<literal>@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
<http://digitale-kuratierung.de/ns/100.txt#char=0,328>
a nif:RFC5147String , nif:Context , nif:String ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "328"^^xsd:nonNegativeInteger ;
nif:isString "[Kartenbrief Anschrift/Absender]\r\nAn\r\nFr��ulein\r\nLuise Maas\r\nin Rottach b/Tegernsee\r\nWohnung Adr. Maler A. Weilhammer\r\nAdresse des Absenders: M��nchen\r\nAgnesstr. 52.II.1.\r\n\r\nM��nchen 13.VIII.12\r\nBin 10.51 Uhr morgen\r\n- Mittwoch - Vormittag\r\nin Tegernsee. \r\nSeien die G��tter uns \r\ngn��diger als heute.\r\n \r\nVon ganzem Herzen.\r\nErich" .
</literal>
</binding>
</result>
There are two problems:
Problem 1 could be solved changing the pipeline configuration. For problem 2 I raised https://github.com/dkt-projekt/e-Sesame/issues/11
Now after some updates it processed the mendelsohn collection in about 5 minutes.
Next step: Find out why 30 documents got stuck in CURRENTLY_PROCESSING
I created an example of the Mendelsohn collection. I extracted all letters from the "handschrift" table, resulting in 2800 files, put them in a ZIP of 3 MB and uploaded them to the DocumentStorage.
Error messages
Within seconds it processed all the files. This is very quick, maybe there is a problem. 135 files failed with errors
Examining the e-Sesame
Counting all triples in e-Sesame reveals 2681 triples. This is not enough, i would expect something around 10,000 even if there are no annotations:
retrieving all NIF contexts reveals
Which is wrong for two reasons:
http://digitale-kuratierung.de/ns/
.