Closed jonquet closed 2 years ago
AGROV last submission log file can be found at this path : /srv/ontoportal/data/repository/AGROVOC/16/parsing.log
the error is an java.lang.OutOfMemoryError, like what we see in the screenshot above (where we restarted the parsing of the last submission)
AGROVOC can't be parsed because of an OutOfMemoryError
it don't work with http://agrovoc.uniroma2.it/latestAgrovoc/agrovoc_lod.nq.zip but works with http://agrovoc.uniroma2.it/latestAgrovoc/agrovoc_lod.nt.zip
but we have a new problem
I recall we had an issue with parsing the AGROVOC files with the OWL-API (see email discussion sept 2020) because of an import of the RDFS triples. For sure this was fixed in the nq version that we used to parse until July 2021. I am not sure we ever parse the nt file.
I am focusing on the OutOfMemory error for now.
Relevant post about the issue: https://stackoverflow.com/questions/52712321/outofmemoryerror-when-joining-a-list-of-strings-in-java
It seems the OWL-API tries to create a string too large.
Error (OutofMemory) reproduced by @jvendetti when parsing (the nq file) "outside" of AgroPortal stack. Note: the nt file parse.
An update :
rapper: Serializing with serializer ntriples
rapper: Error - - XML parser error: Char 0xFFFF out of allowed range
rapper: Error - - XML parser error: PCDATA invalid Char value 65535
rapper: Failed to parse file /srv/ncbo/repository/AGROVOC/1/owlapi.xrdf rdfxml content
rapper: Parsing returned 8673139 triples
When generating a RDF/XML file with Protégé and re-opening this same file with Protégé the error shows up again, but this time with a line number :
Which bring us to the URI : http://aims.fao.org/aos/agrovoc/xDef_8f48da66
Fixing the character allow parsing. We encounter then another issue described in the log :
Probably linked to the recent changes on indexing fields.
indexing error fixed here https://github.com/ontoportal-lirmm/goo/commit/ba27011fd2b093ff04d522477010d146602d0b62
Agrovoc is now parsed, indexed but we had this issue in the diff process https://github.com/agroportal/documentation/issues/246
So I don't think that the automatic pull will work, to follow up in the future releases here https://github.com/agroportal/documentation/issues/251
Since July 2021 version, AGROVOC does not parse anymore.
Plus, we have a pullLocation for AGROVOC : http://data.agroportal.lirmm.fr/ontologies/AGROVOC/submissions/16?display=pullLocation But the ontology never get updated automatically (I have to do it manually each month).
January or February release is expected soon to make new tests.