Open ajs6f opened 6 years ago
@whikloj, can you send me a generous portion of log? I ask because I can reproduce this error up to the fact that the triple is dropped, but I cannot get any error message or exception. I'm just seeing silent dropping.
Well, this is embarrassing. I have to take back my immediately previous comment-- when I adjust my tests correctly, I can see now that I'm not actually seeing the triples dropped. I thought I was, but it was actually the case that I hadn't updated the test rubrics! 🤷♂️
So that leaves us with your example. Can you give me some FOXML that shows this happening, and also the logs section?
Additionally, can you please make sure that you are building from the latest source-- I just made a commit that updated the version of xerces
in use and that might be quite relevant to why you are seeing this and I am not.
Okay I take that back, I forgot to switch to master
. Which I did and rebuild (mvn clean package
) and ran again.
[whikloj@jujo]~% java -jar /opt/fcrepo3-rdf-extractor/target/fcrepo3-rdf-extractor-0.0.1-SNAPSHOT.jar -a /usr/local/fedora/server/config/spring/akubra-llstore.xml -g 'ca.umanitoba.dam.fedora#ri' -o /var/indexes/triples --logback rdf-extractor-logback.xml uofm:2652608
Exception in thread "main" java.lang.NoClassDefFoundError: com.google.common.base.Function
at org.akubraproject.map.IdMappingBlobStore.openConnection(IdMappingBlobStore.java:61)
at edu.si.fcrepo.Extract.lambda$init$0(Extract.java:210)
at edu.si.fcrepo.Extract$$Lambda$20.00000000DC9AC560.call(Unknown Source)
at edu.si.fcrepo.UnsafeIO.unsafeIO(UnsafeIO.java:44)
at edu.si.fcrepo.Extract.init(Extract.java:210)
at edu.si.fcrepo.Extract.main(Extract.java:166)
Caused by: java.lang.ClassNotFoundException: com.google.common.base.Function
at java.net.URLClassLoader.findClass(URLClassLoader.java:609)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:924)
at java.lang.ClassLoader.loadClass(ClassLoader.java:869)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:329)
at java.lang.ClassLoader.loadClass(ClassLoader.java:852)
... 6 more
Do you know what am I doing wrong..
Yep, I know exactly what you did wrong-- nothing.
I goofed up with that last commit changing dependencies-- I did more than I should, or less than I should, depending on your POV. I just added a commit that fixes my mistake, so can you (SORRY!) please try again?
Ok that sent me back to the previous (now deleted) comment. Essentially the extractor can't find any (I tried 10) objects, it appears like all the hashes are different.
Error log is
INFO 2018-04-02 08:37:53.739 [main] (Extract) Using 4 threads for extraction and a queue size of 1048576.
INFO 2018-04-02 08:37:53.742 [main] (Extract) with Akubra configuration from /usr/local/fedora/server/config/spring/akubra-llstore.xml.
INFO 2018-04-02 08:37:53.916 [main] (Extract) Beginning extraction.
INFO 2018-04-02 08:37:53.918 [pool-2-thread-1] (ObjectProcessor) Operating on object URI: uofm:2652608
INFO 2018-04-02 08:37:53.918 [pool-2-thread-2] (ObjectProcessor) Operating on object URI: uofm:1653091
INFO 2018-04-02 08:37:53.918 [pool-2-thread-3] (ObjectProcessor) Operating on object URI: uofm:2108043
INFO 2018-04-02 08:37:53.919 [pool-2-thread-4] (ObjectProcessor) Operating on object URI: uofm:2457369
ERROR 2018-04-02 08:37:53.925 [pool-2-thread-1] (ObjectProcessor) Error reading from object: uofm:2652608
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:bc/uofm%3A2652608')
at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:135)
at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
at edu.si.fcrepo.Extract$$Lambda$25.00000000289A9230.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:811)
ERROR 2018-04-02 08:37:53.925 [pool-2-thread-4] (ObjectProcessor) Error reading from object: uofm:2457369
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:d0/uofm%3A2457369')
at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:135)
at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
at edu.si.fcrepo.Extract$$Lambda$25.00000000289A9230.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:811)
INFO 2018-04-02 08:37:53.925 [pool-2-thread-1] (ObjectProcessor) Operating on object URI: uofm:1801635
INFO 2018-04-02 08:37:53.925 [pool-2-thread-4] (ObjectProcessor) Operating on object URI: uofm:1360961
ERROR 2018-04-02 08:37:53.925 [pool-2-thread-2] (ObjectProcessor) Error reading from object: uofm:1653091
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:dc/uofm%3A1653091')
at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:135)
at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
at edu.si.fcrepo.Extract$$Lambda$25.00000000289A9230.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:811)
INFO 2018-04-02 08:37:53.925 [pool-2-thread-2] (ObjectProcessor) Operating on object URI: uofm:2797288
ERROR 2018-04-02 08:37:53.926 [pool-2-thread-3] (ObjectProcessor) Error reading from object: uofm:2108043
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:e9/uofm%3A2108043')
at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:135)
at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
at edu.si.fcrepo.Extract$$Lambda$25.00000000289A9230.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:811)
INFO 2018-04-02 08:37:53.926 [pool-2-thread-3] (ObjectProcessor) Operating on object URI: uofm:2736579
ERROR 2018-04-02 08:37:53.928 [pool-2-thread-4] (ObjectProcessor) Error reading from object: uofm:1360961
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:dd/uofm%3A1360961')
at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:135)
at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
at edu.si.fcrepo.Extract$$Lambda$25.00000000289A9230.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:811)
INFO 2018-04-02 08:37:53.928 [pool-2-thread-4] (ObjectProcessor) Operating on object URI: uofm:2116284
ERROR 2018-04-02 08:37:53.928 [pool-2-thread-1] (ObjectProcessor) Error reading from object: uofm:1801635
org.akubraproject.MissingBlobException: (Missing blob with id = 'file:d8/uofm%3A1801635')
at org.akubraproject.fs.FSBlob.openInputStream(FSBlob.java:100)
at org.akubraproject.impl.BlobWrapper.openInputStream(BlobWrapper.java:93)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:135)
... repeat errors for all 10 PIDs
INFO 2018-04-02 08:37:53.935 [main] (Extract) Reached 10 objects at end of objects with 0 in-queue after 10 errors.
INFO 2018-04-02 08:37:53.935 [main] (Extract) Finished extraction.
4 quad files with 0 lines in each.
Ah, I have a suspicion here-- when you say "all the hashes are different" do you mean that the hashes that this tool are calculating for where things should be stored on the filesystem are different from the hashes that the repo is calculating for the same purpose?
Ugh I am a moron, I used the PIDs on the CLI as "uofm:2652608" instead of the correct "info:fedora/uofm:2652608". Now it runs and throws the javax.xml.stream.XMLStreamException: The namespace URI "http://islandora.ca/ontology/relsext#" has not been bound to a prefix.
error.
Here is the FOXML for object info:fedora/uofm:2457369
which failed above.
Let's establish one thing clearly: you are not a moron. You are simply a Fedora user. It's easy to look like a moron while using Fedora. I do it weekly. It says a lot more about Fedora than about any of its users.
But now we've got a real bite on the problem. It's clearly an XML processing problem. The weird thing that I have seen and read about this kind of message, but only while writing XML. I have never seen it turn up while reading it. In any event, time for me to dig in a bit on this. Further bulletins as events warrant.
@whikloj -- can you add here the log for when the tool hits this object? Or gist it somewhere, whatever is convenient. Don't bother excerpting, just paste the whole dang thing, if you please. Thanks!
The section of the log is quite simply.
ERROR 2018-04-02 09:11:37.195 [pool-2-thread-4] (ObjectProcessor) Error reading from object: info:fedora/uofm:2736579
java.io.IOException: Error parsing foxml:xmlContent
at com.github.cwilper.fcrepo.dto.foxml.FOXMLReader.readXMLContent(FOXMLReader.java:211)
at com.github.cwilper.fcrepo.dto.foxml.FOXMLReader.readDatastreamVersion(FOXMLReader.java:149)
at com.github.cwilper.fcrepo.dto.foxml.FOXMLReader.readDatastream(FOXMLReader.java:119)
at com.github.cwilper.fcrepo.dto.foxml.FOXMLReader.readObject(FOXMLReader.java:77)
at com.github.cwilper.fcrepo.dto.foxml.FOXMLReader.readObject(FOXMLReader.java:61)
at edu.si.fcrepo.ObjectProcessor.accept(ObjectProcessor.java:139)
at edu.si.fcrepo.Extract.lambda$null$3(Extract.java:240)
at edu.si.fcrepo.Extract$$Lambda$25.000000002C9A91A0.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:811)
Caused by: javax.xml.stream.XMLStreamException: The namespace URI "http://islandora.ca/ontology/relsext#" has not been bound to a prefix.
at com.ibm.xml.xlxp.api.stax.msg.StAXMessageProvider.throwXMLStreamException(StAXMessageProvider.java:59)
at com.ibm.xml.xlxp.api.stax.XMLStreamWriterBase.writeStartElement(XMLStreamWriterBase.java:357)
at com.ibm.xml.xlxp.api.stax.XMLOutputFactoryImpl$XMLStreamWriterProxy.writeStartElement(XMLOutputFactoryImpl.java:120)
at javanet.staxutils.io.XMLWriterUtils.writeStartElement(XMLWriterUtils.java:1254)
at javanet.staxutils.XMLStreamEventWriter.sendEvent(XMLStreamEventWriter.java:121)
at javanet.staxutils.BaseXMLEventWriter.add(BaseXMLEventWriter.java:519)
at javanet.staxutils.BaseXMLEventWriter.add(BaseXMLEventWriter.java:527)
at javanet.staxutils.XMLStreamUtils.copy(XMLStreamUtils.java:404)
at com.github.cwilper.fcrepo.dto.core.io.XMLUtil.copy(XMLUtil.java:233)
at com.github.cwilper.fcrepo.dto.core.io.XMLUtil.copy(XMLUtil.java:206)
at com.github.cwilper.fcrepo.dto.foxml.FOXMLReader.readXMLContent(FOXMLReader.java:208)
... 12 common frames omitted
I could turn up the logging level if you'd like?
No, that's good, that's the info I was looking for. The crucial lines are:
com.ibm.xml.xlxp.api.stax.msg.StAXMessageProvider.throwXMLStreamException(StAXMessageProvider.java:59)
and the lines below it. They tell me that this isn't my code, it's @cwilper's FOXML-reading code from a million years ago.
So that makes me feel better. And it's worse, because now I have to go into that (I hope totally unmaintained, because I'm sure Chris has better things to do at this point in his career) code and find the problem and then find a way of fixing it. Hopefully (crosses-fingers) I can set a system property or do something else to control the behavior of that library's XML-reading code. If not... I get out the mattock and the spade.
And, now I know why I've only ever seen this in writing XML but it's showing up here. Looks like @cwilper is copying all the XML around. See around:
at com.github.cwilper.fcrepo.dto.core.io.XMLUtil.copy(XMLUtil.java:233)
Ok, where is my pickaxe?
Bad news, @whikloj -- I can't reproduce. Using your FOXML and master
, I get a fine run and the quads:
<info:fedora/uofm:2457369/JP2> <http://islandora.ca/ontology/relsext#width> "4619" <#ri> .
<info:fedora/uofm:2457369/JP2> <http://islandora.ca/ontology/relsext#height> "7150" <#ri> .
appear with no problem...
Can you tell me a bit about your Java setup? What version, and did you put any extra libraries into play by installing them in your Java home itself? For example, did you install any XML processing libraries there?
Not that I know of, maybe too old of a version?
[whikloj@jujo]~% java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 8.0.5.5 - pxa6480sr5fp5-20171114_01(SR5 FP5))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64 Compressed References 20171102_369060 (JIT enabled, AOT enabled)
OpenJ9 - 7ade437
OMR - 1b656cb
IBM - 59c3d96)
JCL - 20171113_01 based on Oracle jdk8u151-b12
No, actually I suspect the fact that it is an IBM JVM. See here. Can you possibly try using another JVM? (E.g. OpenJDK or really, anything else, just to get the comparison?)
@ajs6f for the win.
[whikloj@jujo]~% java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
INFO 2018-04-02 11:23:28.452 [main] (Extract) Using 4 threads for extraction and a queue size of 1048576.
INFO 2018-04-02 11:23:28.454 [main] (Extract) with Akubra configuration from /usr/local/fedora/server/config/spring/akubra-llstore.xml.
INFO 2018-04-02 11:23:28.613 [main] (Extract) Beginning extraction.
INFO 2018-04-02 11:23:28.614 [pool-2-thread-2] (ObjectProcessor) Operating on object URI: info:fedora/uofm:1653091
INFO 2018-04-02 11:23:28.614 [pool-2-thread-1] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2652608
INFO 2018-04-02 11:23:28.629 [pool-2-thread-4] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2457369
INFO 2018-04-02 11:23:28.629 [pool-2-thread-3] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2108043
INFO 2018-04-02 11:23:29.029 [pool-2-thread-2] (ObjectProcessor) Operating on object URI: info:fedora/uofm:1801635
INFO 2018-04-02 11:23:29.032 [pool-2-thread-3] (ObjectProcessor) Operating on object URI: info:fedora/uofm:1360961
INFO 2018-04-02 11:23:29.035 [pool-2-thread-4] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2797288
INFO 2018-04-02 11:23:29.041 [pool-2-thread-1] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2736579
INFO 2018-04-02 11:23:29.101 [pool-2-thread-3] (ObjectProcessor) Operating on object URI: info:fedora/uofm:2116284
INFO 2018-04-02 11:23:29.113 [pool-2-thread-2] (ObjectProcessor) Operating on object URI: info:fedora/uofm:1374689
INFO 2018-04-02 11:23:29.187 [main] (Extract) Reached 10 objects at end of objects with 0 in-queue after 0 errors.
INFO 2018-04-02 11:23:29.187 [main] (Extract) Finished extraction.
That was it? If that was it I will try adding a commit that will do what was described in that Stackoverflow post. Then you can run with the JVM you were using.
That was all I changed, I even had this JVM installed just needed to select it with update-alternatives
k, I just added a dependency and setting for using Woodstox. If/when you have time, please try it out again with the IBM JVM. If it works, I reckon we can close this and move on the next whack-a-mole.
When RDF/XML includes a namespace without a prefix, a la:
the triples derived from elements like
width
are being dropped with a message similar to the title of this ticket.