Closed ablwr closed 5 years ago
In the FITS release notes for 1.0 (the one after 0.10.1), I saw this line "Change compiler compliance to Java 7. Was mistakenly set at Java 8."[1] It could be an issue related to no longer being explicit about these kinds of things that is causing our issue.
On Bionic,fits-nailgun it's failing hard with an stack trace in the logs, sudo journalctl -u fits-nailgun
shows
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f5ba5220e33, pid=8159, tid=0x00007f5b86415700
...
@scollazo Does this seem related or unrelated to #222 and plans to fix things there?
Seems the same problem @mamedin fixed in https://github.com/artefactual-labs/am-packbuild/issues/172 for centos
I think the am-packbuild issues have resolved this, although I need to test a little bit more! But I'm not immediately seeing failures on files.
This is tentatively resolved in Bionic, but still failing for CentOS.
In https://github.com/artefactual-labs/am-packbuild/pull/190 it looks like the FITS package installed is 1.1.0, but we were on 0.8.4 according to the version check when logged in. Nothing this could be a cause, but haven't investigated yet.
Hi @ablwr , you uncovered a nice rabbit hole.
We are ship fping 3 different versions of fits
We should align them , and use the same fits version everywhere.
@scollazo It's weird because if you log into a current 18.04 deployment and run ng edu.harvard.hul.ois.fits.Fits -v
you get 0.8.4.
And it looks like there's a fits.sh command which is 1.1.0.
Whaaaa? Two FITS on 18.04?
The nailgun command is what is called in the FPR.
Fits has been updated to 1.1.0 on all supported environments ( Ubuntu 18.04, Ubuntu 16.04 and CentOS/RHEL)
FITS still seems to not be functional on Ubuntu 16.04, it still exits with a segfault (I think) error: Command FITS failed with exit status -11; stderr:
It seems to begin working but it may have trouble and break, and not be able to get back up again. This could be related to #868
When I run this on the machine outside of AM, here is the error log:
artefactual@am18xenial:~$ ng edu.harvard.hul.ois.fits.Fits -i archivematica-sampledata/TestTransfers/badNames/objects/ampersand\&ersand.txt
log4j:WARN No appenders could be found for logger (org.apache.commons.configuration.ConfigurationUtils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
at org.apache.commons.configuration.XMLConfiguration.createDocumentBuilder(XMLConfiguration.java:579)
at org.apache.commons.configuration.XMLConfiguration.load(XMLConfiguration.java:687)
at org.apache.commons.configuration.XMLConfiguration.load(XMLConfiguration.java:654)
at org.apache.commons.configuration.XMLConfiguration$XMLFileConfigurationDelegate.load(XMLConfiguration.java:1283)
at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:285)
at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:217)
at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:195)
at org.apache.commons.configuration.AbstractHierarchicalFileConfiguration.load(AbstractHierarchicalFileConfiguration.java:164)
at org.apache.commons.configuration.AbstractHierarchicalFileConfiguration.<init>(AbstractHierarchicalFileConfiguration.java:91)
at org.apache.commons.configuration.XMLConfiguration.<init>(XMLConfiguration.java:214)
at edu.harvard.hul.ois.fits.Fits.<init>(Fits.java:133)
at edu.harvard.hul.ois.fits.Fits.<init>(Fits.java:95)
at edu.harvard.hul.ois.fits.Fits.main(Fits.java:224)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.martiansoftware.nailgun.NGSession.run(NGSession.java:280)
Caused by: java.lang.RuntimeException: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:308)
... 20 more
Caused by: java.util.ServiceConfigurationError: javax.xml.parsers.DocumentBuilderFactory: Error reading configuration file
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.parse(ServiceLoader.java:309)
at java.util.ServiceLoader.access$200(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
at java.util.ServiceLoader$LazyIterator.access$600(ServiceLoader.java:323)
at java.util.ServiceLoader$LazyIterator$1.run(ServiceLoader.java:396)
at java.util.ServiceLoader$LazyIterator$1.run(ServiceLoader.java:395)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:398)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289)
... 20 more
Caused by: java.io.FileNotFoundException: /usr/share/fits/lib/tika-app-1.3.jar (No such file or directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:225)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:103)
at sun.net.www.protocol.jar.URLJarFile.<init>(URLJarFile.java:93)
at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:152)
at java.net.URL.openStream(URL.java:1045)
at java.util.ServiceLoader.parse(ServiceLoader.java:304)
... 31 more
I think that the fits-nailgun service needed a restart, can you try again?
Also, you need to run it as user archivematica
, (or root), as the process uses a logfile at /var/log/archivematica/fits.log
That's a good point! But I am running as superuser.
If I run sudo service gearman-job-server status
I get that it is running, but FITS still fails, and sudo service fits-nailgun restart
doesn't seem to pick it back up either.
Although the expectation is that the user shouldn't have to do any of this, it should be able to pick itself back up if it falls over due to a problem.
@ablwr, created a new pr in the ansible archivematica role to address nailgun restarts: https://github.com/artefactual-labs/ansible-archivematica-src/pull/226
Testing with @scollazo's changes for 16.04 Xenial but still getting consistent failures within AM although I am able to run it on the CLI successfully, as regular user and as superuser. (ng edu.harvard.hul.ois.fits.Fits -i /home/artefactual/archivematica-sampledata/SampleTransfers/BagTransfer/bagit.txt
)
Still testing, but that's my update right now.
@mamedin made a few changes -- it seems that the fixes might not have been actually integrated into our deployments -- and now Bionic is zipping through like a dream!! Xenial is not quite there yet, but this may be for the same reason.
Fits wasn't able to work on files when using relative paths. This was caused by https://github.com/artefactual-labs/am-packbuild/blob/qa/1.x/debs/xenial/fits/debian/patches/changehome.diff#L13, that was introduced to avoid a nasty warning from log4j.
Another way to avoid the log4j warning, without changing the working directory, is adding -Dlog4j.configuration=file:/usr/share/fits/log4j.properties
to the /usr/bin/fits.sh script
I tested on the centos qa server (after wiping and rebuilding) and it's still failing on characterize with "Command FITS failed with exit status -11; stderr:"
When running the ng
binary on CentOS and Xenial, we got this kernel errors:
Oct 29 13:53:27 mamedin-centos-fits kernel: d8b83245-4c31-4[28915]: segfault at 7ffed655ef90 ip 00007f574c003281 sp 00007ffed655ef90 error 6 in libc-2.17.so[7f574bf81000+1c3000]
Oct 29 13:53:40 mamedin-centos-fits kernel: dcb80c7f-46cc-4[28917]: segfault at 7ffc6c3e4fb8 ip 00007f1da4fba56a sp 00007ffc6c3e4fa0 error 6 in libc-2.17.so[7f1da4f92000+1c3000]
Oct 29 13:53:53 mamedin-centos-fits kernel: cc0cc412-e921-4[28919]: segfault at 7fffcd172ff0 ip 00007f9d99d0e281 sp 00007fffcd172ff0 error 6 in libc-2.17.so[7f9d99c8c000+1c3000]
It was fixed on xenial using the bionic package. For CentOS it can be fixed using the bionic binary too (see https://github.com/artefactual-labs/am-packbuild/pull/198 )
New CentOS packages for nailgun
and fits
have been created and uploaded to the rpm repository (see https://github.com/artefactual-labs/am-packbuild/pull/199):
fits-1.1.0-3.el7.x86_64.rpm nailgun-0.9.3-1.el7.x86_64.rpm
am18centos.qa and am18rpm.qa were updated. It is ready for review again.
Tested on am18rpm.qa with SampleTransfers/OfficeDocs and FITS did not fail. Did not test with @ablwr 's exact example tho?
Expected behaviour
FITS will do its job to process all files that are set to the default configuration in the Characterize & Extract microservices.
Current behaviour
FITS works sometimes, but fails a lot of the time. It seems to fail all the time for Characterization of metadata files during Ingest, and some of the time on the files themselves during Transfer.
Steps to reproduce
Pick a file and attempt to process all the way through. The failure of the metadata on the Ingest stage will not highlight itself -- you have to click through to notice it.
For example, try running this file through Archivematica, or even on the CLI using FITS:
OPF\ format-corpus/office-examples/Old\ Word\ file/NEWSSLID.DOC
and it should fail.In my experience, just running FITS on the CLI on an empty .txt file will also fail.
Here is the full error on Bionic: https://gist.github.com/ablwr/2f7f25f62ddfed04a36fc466e7287910
Here is the error on CentOS from the CLI -- seems that it cannot find files.
Your environment (version of Archivematica, OS version, etc)
Relatively consistent across Ubunt 16.04, 18.04 and CentOS.
Originally brought up in this Issue: https://github.com/archivematica/Issues/issues/223
The error messages are different between Ubuntu and CentOS.
CentOS and Ubuntu 16.04: Command FITS failed with exit status -11; stderr:
Ubuntu 18.04 is more verbose, but ends with: Command FITS failed with exit status 131; stderr:
CentOS uses FITS version 0.10.1. Ubuntus both use 0.8.4. They are both notably quite behind the most recent version of FITS which is 1.3.0
For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle: