Open marc-karnovgroup opened 6 years ago
Note that there is grammar caching done on the server; You might want to check with `validation:show-grammar-cache() what is happening.
Well, the grammar cache looks OK
http://localhost:8080/exist/rest/db/karnov/101?_query=validation:show-grammar-cache()
shows expected DTD's <LiteralSystemId>alllov.dtd</LiteralSystemId>
and <LiteralSystemId>allftbook.dtd</LiteralSystemId>
But I don't see why a cached grammar should result in different upload behaviour dependent on upload order ??? The DTD's should be hit and cached on firs encounter, no matter what document upload order is given ??
it was just an idea; I gave up on DTDs years ago....
@marc-karnovgroup is there any extraneous error information present in exist.log
?
Also are you able to reduce this to smaller test with just 2 or 3 files for validation?
I'll attach the exist.log
here. The Catalog seems loaded fine (and otherwise I would have much more failures in XML load).
2018-02-28 11:59:37,817 [main] INFO (Configuration.java [configureValidation]:1493) - Add catalog uri file:///home/marc/karnov/git/xml-schemas/catalog-ilse-variant-utf-8.xml
2018-02-28 11:59:37,818 [main] INFO (GrammarPool.java [<init>]:54) - Initializing GrammarPool.
2018-02-28 11:59:37,820 [main] INFO (JettyStart.java [run]:164) - Configuring eXist from /home/marc/local/exist_400/conf.xml
There is one error under startup, which seem unrelated to the issue
2018-02-28 11:59:38,974 [main] INFO (MimeTable.java [load]:244) - Loading mime table from file /home/marc/local/exist_400/mime-types.xml
2018-02-28 11:59:40,331 [main] ERROR (DocumentTriggers.java [afterCreateDocument]:238) - Invalid URI: xmldb URI scheme does not start with xmldb:: http://exist-db.org/xquery/console
java.lang.IllegalArgumentException: Invalid URI: xmldb URI scheme does not start with xmldb:: http://exist-db.org/xquery/console
at org.exist.xmldb.XmldbURI.create(XmldbURI.java:219) ~[exist.jar:4.0.0]
at org.exist.xmldb.XmldbURI.append(XmldbURI.java:595) ~[exist.jar:4.0.0]
I have many of transactions not committed, though, which may explain why documents are not uploaded ??
2018-02-28 12:00:26,307 [qtp645323781-25] WARN (TransactionManager.java [close]:186) - Transaction was not committed or aborted, auto aborting!
I can probably reduce the test set somehow, but the issue seems to be connected to many following REST PUT uploads and their order. The first file in the first upload ordered by ls
failing dependent on order is oaaBKG2002928_R.7000255906.xml
, which is the 556th file loaded.
If I upload some of the failing files one by one manually, they do succeed (except for the 62 ones which are not DTD valid), but the the upload order is also different then.
I tries also to run the same upload order with a one second sleep in between every upload, which results in the same amount of documents failing, so I do not believe it is a race condition.
I have tried this with both 3.6.1 and 4.0.0 and have repeated the experiment three times with the 4.0.0 version, all numbers of accepted documents according to upload order are repeatable, so it seems to be a deterministic issue.
I am happy to provide my XML files and Calatog setup, DTD's and the like by personal communication, including a step-by-step description on how to reproduce the issue. Please notice that this info is regarded confidential by my organization, and should only be used for bug verification.
I did test this some more, and come to the conclusion that it is not the DTD validation causing this erratic behaviour, but the entity resolution. All test documents are in latin-1 and have plenty of entities defined, which are resolved during the exist-db insert.
I did try to convert all input docments to uft8 with all entities resolved, using
xmlllint --dtdattr --noent --nonet --encode UTF-8 in_latin1.xml > out_utf8.xml
Upload of all utf8 documents succeeds now, regardless of order (except for those violating the DTD, or those having unresolved entities).
My working hypothesis is now that it is the mass of entity resolutions which does trigger the issue.
thnx for the input. Interesting observation....
What is the problem
Uploading of many documents with DTD entity resolution enabled fails on different documents dependent on order of docs uploaded
Se attached
exist_jetty_2018_02_28.request.log
Examples of failing/working PUT uploads (with same running server instance, but different upload order) are
What did you expect
Out of 4637 XML docs 62 do not validate against the DTD's, so I would expect 4637 - 62 = 4575 documents uploaded without problems.
I expect the same document to either fail or succeed, independently of order of post requests on same document set.
I expect the number of uploaded documents to be equal for multiple uploads of the same documents.
Describe how to reproduce or add a test
Those 4 uploads of the same files have been done in different order using curl PUT:
replace the
ls
in the first line withls -r
,ls -S
, andls -rS
for each upload, thus enforcing different order. More docs are added to the same DB (no document remove between runs):I can provide the used XML Catalogs, DTD's and XML data files upon request per e-mail, but can not upload them in a public visible bug report.
Context information
Please always add the following information
Any custom changes in e.g. conf.xml: entity resolver enabled in
/home/marc/local/exist_361/conf.xml
and/home/marc/local/exist_361/webapp/WEB-INF/conf.xml
exist_jetty_2018_02_28.request.log
Contact info e-mail: marc-dot-cromme-at-karnovgroup-dot-com