archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: In qa/1.x, FITS is currently broken #247

Closed ablwr closed 5 years ago

ablwr commented 6 years ago

Expected behaviour

FITS will do its job to process all files that are set to the default configuration in the Characterize & Extract microservices.

Current behaviour

FITS works sometimes, but fails a lot of the time. It seems to fail all the time for Characterization of metadata files during Ingest, and some of the time on the files themselves during Transfer.

Steps to reproduce

Pick a file and attempt to process all the way through. The failure of the metadata on the Ingest stage will not highlight itself -- you have to click through to notice it.

For example, try running this file through Archivematica, or even on the CLI using FITS: OPF\ format-corpus/office-examples/Old\ Word\ file/NEWSSLID.DOC and it should fail.

In my experience, just running FITS on the CLI on an empty .txt file will also fail.

Here is the full error on Bionic: https://gist.github.com/ablwr/2f7f25f62ddfed04a36fc466e7287910

Here is the error on CentOS from the CLI -- seems that it cannot find files.

[artefactual@centos-am17 tmp]$ ls
empty.txt
[artefactual@centos-am17 tmp]$ ng edu.harvard.hul.ois.fits.Fits -i empty.txt -o output.txt > /dev/null
Oct 04, 2018 6:19:58 PM edu.harvard.hul.ois.jhove.JhoveBase init
SEVERE: Testing SEVERE level
edu.harvard.hul.ois.fits.exceptions.FitsConfigurationException: empty.txt does not exist or is not readable
    at edu.harvard.hul.ois.fits.Fits.examine(Fits.java:518)
    at edu.harvard.hul.ois.fits.Fits.doSingleFile(Fits.java:376)
    at edu.harvard.hul.ois.fits.Fits.main(Fits.java:262)
    at sun.reflect.GeneratedMethodAccessor354.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.martiansoftware.nailgun.NGSession.run(NGSession.java:331)

Your environment (version of Archivematica, OS version, etc)

Relatively consistent across Ubunt 16.04, 18.04 and CentOS.

Originally brought up in this Issue: https://github.com/archivematica/Issues/issues/223

The error messages are different between Ubuntu and CentOS.

CentOS and Ubuntu 16.04: Command FITS failed with exit status -11; stderr:

Ubuntu 18.04 is more verbose, but ends with: Command FITS failed with exit status 131; stderr:

CentOS uses FITS version 0.10.1. Ubuntus both use 0.8.4. They are both notably quite behind the most recent version of FITS which is 1.3.0


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

ablwr commented 6 years ago

In the FITS release notes for 1.0 (the one after 0.10.1), I saw this line "Change compiler compliance to Java 7. Was mistakenly set at Java 8."[1] It could be an issue related to no longer being explicit about these kinds of things that is causing our issue.

scollazo commented 6 years ago

On Bionic,fits-nailgun it's failing hard with an stack trace in the logs, sudo journalctl -u fits-nailgun shows

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5ba5220e33, pid=8159, tid=0x00007f5b86415700
...
ablwr commented 6 years ago

@scollazo Does this seem related or unrelated to #222 and plans to fix things there?

scollazo commented 6 years ago

Seems the same problem @mamedin fixed in https://github.com/artefactual-labs/am-packbuild/issues/172 for centos

ablwr commented 6 years ago

I think the am-packbuild issues have resolved this, although I need to test a little bit more! But I'm not immediately seeing failures on files.

ablwr commented 6 years ago

This is tentatively resolved in Bionic, but still failing for CentOS.

ablwr commented 6 years ago

In https://github.com/artefactual-labs/am-packbuild/pull/190 it looks like the FITS package installed is 1.1.0, but we were on 0.8.4 according to the version check when logged in. Nothing this could be a cause, but haven't investigated yet.

scollazo commented 6 years ago

Hi @ablwr , you uncovered a nice rabbit hole.

We are ship fping 3 different versions of fits

We should align them , and use the same fits version everywhere.

ablwr commented 6 years ago

@scollazo It's weird because if you log into a current 18.04 deployment and run ng edu.harvard.hul.ois.fits.Fits -v you get 0.8.4.

And it looks like there's a fits.sh command which is 1.1.0.

Whaaaa? Two FITS on 18.04?

The nailgun command is what is called in the FPR.

scollazo commented 6 years ago

Fits has been updated to 1.1.0 on all supported environments ( Ubuntu 18.04, Ubuntu 16.04 and CentOS/RHEL)

ablwr commented 6 years ago

FITS still seems to not be functional on Ubuntu 16.04, it still exits with a segfault (I think) error: Command FITS failed with exit status -11; stderr:

It seems to begin working but it may have trouble and break, and not be able to get back up again. This could be related to #868

ablwr commented 6 years ago

When I run this on the machine outside of AM, here is the error log:

artefactual@am18xenial:~$ ng edu.harvard.hul.ois.fits.Fits -i archivematica-sampledata/TestTransfers/badNames/objects/ampersand\&ampersand.txt 
log4j:WARN No appenders could be found for logger (org.apache.commons.configuration.ConfigurationUtils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
    at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
    at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
    at org.apache.commons.configuration.XMLConfiguration.createDocumentBuilder(XMLConfiguration.java:579)
    at org.apache.commons.configuration.XMLConfiguration.load(XMLConfiguration.java:687)
    at org.apache.commons.configuration.XMLConfiguration.load(XMLConfiguration.java:654)
    at org.apache.commons.configuration.XMLConfiguration$XMLFileConfigurationDelegate.load(XMLConfiguration.java:1283)
    at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:285)
    at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:217)
    at org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:195)
    at org.apache.commons.configuration.AbstractHierarchicalFileConfiguration.load(AbstractHierarchicalFileConfiguration.java:164)
    at org.apache.commons.configuration.AbstractHierarchicalFileConfiguration.<init>(AbstractHierarchicalFileConfiguration.java:91)
    at org.apache.commons.configuration.XMLConfiguration.<init>(XMLConfiguration.java:214)
    at edu.harvard.hul.ois.fits.Fits.<init>(Fits.java:133)
    at edu.harvard.hul.ois.fits.Fits.<init>(Fits.java:95)
    at edu.harvard.hul.ois.fits.Fits.main(Fits.java:224)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.martiansoftware.nailgun.NGSession.run(NGSession.java:280)
Caused by: java.lang.RuntimeException: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:308)
    ... 20 more
Caused by: java.util.ServiceConfigurationError: javax.xml.parsers.DocumentBuilderFactory: Error reading configuration file
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.parse(ServiceLoader.java:309)
    at java.util.ServiceLoader.access$200(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
    at java.util.ServiceLoader$LazyIterator.access$600(ServiceLoader.java:323)
    at java.util.ServiceLoader$LazyIterator$1.run(ServiceLoader.java:396)
    at java.util.ServiceLoader$LazyIterator$1.run(ServiceLoader.java:395)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:398)
    at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
    at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289)
    ... 20 more
Caused by: java.io.FileNotFoundException: /usr/share/fits/lib/tika-app-1.3.jar (No such file or directory)
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:225)
    at java.util.zip.ZipFile.<init>(ZipFile.java:155)
    at java.util.jar.JarFile.<init>(JarFile.java:166)
    at java.util.jar.JarFile.<init>(JarFile.java:103)
    at sun.net.www.protocol.jar.URLJarFile.<init>(URLJarFile.java:93)
    at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
    at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
    at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
    at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:152)
    at java.net.URL.openStream(URL.java:1045)
    at java.util.ServiceLoader.parse(ServiceLoader.java:304)
    ... 31 more
scollazo commented 6 years ago

I think that the fits-nailgun service needed a restart, can you try again?

scollazo commented 6 years ago

Also, you need to run it as user archivematica , (or root), as the process uses a logfile at /var/log/archivematica/fits.log

ablwr commented 6 years ago

That's a good point! But I am running as superuser.

If I run sudo service gearman-job-server status I get that it is running, but FITS still fails, and sudo service fits-nailgun restart doesn't seem to pick it back up either.

Although the expectation is that the user shouldn't have to do any of this, it should be able to pick itself back up if it falls over due to a problem.

scollazo commented 6 years ago

@ablwr, created a new pr in the ansible archivematica role to address nailgun restarts: https://github.com/artefactual-labs/ansible-archivematica-src/pull/226

ablwr commented 6 years ago

Testing with @scollazo's changes for 16.04 Xenial but still getting consistent failures within AM although I am able to run it on the CLI successfully, as regular user and as superuser. (ng edu.harvard.hul.ois.fits.Fits -i /home/artefactual/archivematica-sampledata/SampleTransfers/BagTransfer/bagit.txt)

Still testing, but that's my update right now.

ablwr commented 6 years ago

@mamedin made a few changes -- it seems that the fixes might not have been actually integrated into our deployments -- and now Bionic is zipping through like a dream!! Xenial is not quite there yet, but this may be for the same reason.

scollazo commented 6 years ago

Fits wasn't able to work on files when using relative paths. This was caused by https://github.com/artefactual-labs/am-packbuild/blob/qa/1.x/debs/xenial/fits/debian/patches/changehome.diff#L13, that was introduced to avoid a nasty warning from log4j.

Another way to avoid the log4j warning, without changing the working directory, is adding -Dlog4j.configuration=file:/usr/share/fits/log4j.properties to the /usr/bin/fits.sh script

sromkey commented 6 years ago

I tested on the centos qa server (after wiping and rebuilding) and it's still failing on characterize with "Command FITS failed with exit status -11; stderr:"

mamedin commented 6 years ago

When running the ng binary on CentOS and Xenial, we got this kernel errors:

Oct 29 13:53:27 mamedin-centos-fits kernel: d8b83245-4c31-4[28915]: segfault at 7ffed655ef90 ip 00007f574c003281 sp 00007ffed655ef90 error 6 in libc-2.17.so[7f574bf81000+1c3000]
Oct 29 13:53:40 mamedin-centos-fits kernel: dcb80c7f-46cc-4[28917]: segfault at 7ffc6c3e4fb8 ip 00007f1da4fba56a sp 00007ffc6c3e4fa0 error 6 in libc-2.17.so[7f1da4f92000+1c3000]
Oct 29 13:53:53 mamedin-centos-fits kernel: cc0cc412-e921-4[28919]: segfault at 7fffcd172ff0 ip 00007f9d99d0e281 sp 00007fffcd172ff0 error 6 in libc-2.17.so[7f9d99c8c000+1c3000]

It was fixed on xenial using the bionic package. For CentOS it can be fixed using the bionic binary too (see https://github.com/artefactual-labs/am-packbuild/pull/198 )

mamedin commented 6 years ago

New CentOS packages for nailgun and fits have been created and uploaded to the rpm repository (see https://github.com/artefactual-labs/am-packbuild/pull/199):

fits-1.1.0-3.el7.x86_64.rpm nailgun-0.9.3-1.el7.x86_64.rpm

am18centos.qa and am18rpm.qa were updated. It is ready for review again.

kellyannewithane commented 6 years ago

Tested on am18rpm.qa with SampleTransfers/OfficeDocs and FITS did not fail. Did not test with @ablwr 's exact example tho?