ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
292 stars 67 forks source link

does not recognize hmmpress-generated aux files. #355

Closed Arkadiy-Garber closed 5 months ago

Arkadiy-Garber commented 5 months ago

I ran HMMPRESS within the superfamily/1.75/ directory (after removing duplicates from the hmmlib_1.75 file). Still getting the following error:

Error: Failed to open binary auxfiles for data/superfamily/1.75/hmmlib_1.75: use hmmpress first

among the many other errors that I can't understand. See attached log for full error output.

error log from interproscan.txt

matthiasblum commented 5 months ago

Hi,

How did you obtain/install InterProScan, and which version do you have?

With the latest release (5.67-99.0), we addressed the issue with duplicated fields. In previous releases, an older version of hmmpress was used so this error should not occurs with a standard install.

Arkadiy-Garber commented 5 months ago

Hi, thanks for your swift response. Interproscan was installed using conda with version:

InterProScan version 5.59-91.0 InterProScan 64-Bit build (requires Java 11)

I don't remember how I obtained all the database files under data, but here is the current configuration

(interproscan) MAB@Axceleron-WKS:~/databases/interproscan-5-44.0/data$ ls *
coils:
2.2

freemarker:
entry_colours.properties  entry_hierarchy.csv  entry_to_go.psv  resources  WEB-INF

gene3d:
3.5.0

hamap:
201302.26

pfam:
27.0

phobius:
1.01

pirsf:
2.84

prints:
42.0

prodom:
2006.1

prosite:
20.89

smart:
6.2

superfamily:
1.75

tigrfam:
13.0

tmhmm:
2.0
Arkadiy-Garber commented 5 months ago

I did fix the hmm_lib_1.75 file to remove the duplicated fields (which resulted in an HMM file with just over 2000 models), and was able to successfully run hmmpress on the fixed hmm_lib_1.75 file. However, I still get the same error, which doesn't make sense to me, unless it's expecting a differen set of aux files:

Error: Failed to open binary auxfiles for data/superfamily/1.75/hmmlib_1.75: use hmmpress first

matthiasblum commented 5 months ago

Have you run hmmpress with the -f flag? By default hmmpress does not create new auxiliary files if they already exist; the -f flag makes hmmpress overwrite previously created auxiliary files.

Alternatively, you can delete existing auxiliary files with the following command:

find path/to/data/directory/ -type f -name "*.h3*" -delete

and let InterProScan rebuild missing auxiliary files next time it starts.


On a side note, you installed InterProScan 5.59-91.0 through Conda but it seems that you using data from InterProScan 5.44-79.0 (directory named interproscan-5-44.0), which might cause an issue when running InterProScan.

Arkadiy-Garber commented 5 months ago

Yes, I ran with that flag, and I also removed all other pre-existing auxfiles. Here is what I have so far:

(interproscan) MAB@Axceleron-WKS:~/databases/interproscan-5-44.0/data/superfamily/1.75$ ls -lht
total 416M
-rw-r--r-- 1 MAB users  35M Apr 11 16:19 hmmlib_1.75.h3f
-rw-r--r-- 1 MAB users  97K Apr 11 16:19 hmmlib_1.75.h3i
-rw-r--r-- 1 MAB users  60M Apr 11 16:19 hmmlib_1.75.h3m
-rw-r--r-- 1 MAB users  72M Apr 11 16:19 hmmlib_1.75.h3p
drwxr-xr-x 2 MAB users    3 Apr 11 16:19 safe
-rw-r--r-- 1 MAB users 145M Apr 10 02:57 hmmlib_1.75
-rw-r--r-- 1 MAB users 855K Sep 24  2013 model.tab
-rw-r--r-- 1 MAB users 4.2M Sep 24  2013 pdbj95d
-rw-r--r-- 1 MAB users  85M Sep 24  2013 self_hits.tab
-rw-r--r-- 1 MAB users  11M Sep 24  2013 dir.cla.scop.txt_1.75
-rw-r--r-- 1 MAB users 4.7K Sep 24  2013 LICENSE
-rw-r--r-- 1 MAB users 5.8M Sep 24  2013 dir.des.scop.txt_1.75

Still crashing with the same error after running a simple version:

interproscan.sh -i /data/MAB/4638/GCA_003546975.1.faa should I re-install with conda an older version of interproscan, or is there a place where I can get a new version of these databases? New links I find now are hundreds of Gb and currently not possible for me to download.

Thanks, Arkadiy

matthiasblum commented 5 months ago

The recommended way to install InterProScan is to download and extract the archive from our FTP. The link to latest version is available in the Download section of the InterPro website.

The archive is not hundreds of GB, but 5.4GB. Unarchived, InterProScan takes about 50GB.

If you do not have the bandwidth or the disk space, you can submit your sequences on the InterPro website to scan them against them the latest version of InterProScan.

Arkadiy-Garber commented 5 months ago

Hi, I was able to download the software and using the interproscan.sh script within that package, the program worked successfully. Thanks for your help!

matthiasblum commented 5 months ago

Hi @Arkadiy-Garber, glad it worked. :+1:

Arkadiy-Garber commented 5 months ago

Thanks for the assistance!

zx0223winner commented 1 month ago

Hi @matthiasblum

Follow up with your suggestion: "With the latest release (5.67-99.0), we addressed the issue with duplicated fields. In previous releases, an older version of hmmpress was used so this error should not occurs with a standard install."

Maybe update a new version (5.67-99.0) to Anaconda, where only released the 5.59 version since Dec 2022. The InterProScan 5.59 (Conda version) remains the issue :

"Error: Failed to open binary auxfiles for data/superfamily/1.75/hmmlib_1.75: use hmmpress first".

Will appreciate for polishing the Conda installation.