Putnam-Lab / Lab_Management

13 stars 7 forks source link

Error with hmm search while using KofamScan to map KEGG terms to genome #21

Closed daniellembecker closed 3 years ago

daniellembecker commented 3 years ago

Follow step 3: map KEGG terms to a genome in @echille's lab notebook 2020-10-08-M-capitata-functional-annotation-pipeline.md post for general steps and explanations on the beginning of this process.

Also, see @echille GitHub for further KEGG ontology steps in R.

I ran into an issue when downloading and using this part of @echille code to get the up-to-date Kofam database:

curl -O ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz | tar xf > profiles #download and inflate profiles

It seemed to not download/extract on of the HMM files for some reason which I realized after running the KofamScan script, adapted from @echille original found here.

In the slurm output, this error was written out:

hmmsearch was not run successfully

Error: File existence/permissions problem in trying to open HMM file 
/data/putnamlab/dbecks/Becker_E5/Becker_RNASeq/data/KEGG/profiles/K00637.hmm.
HMM file /data/putnamlab/dbecks/Becker_E5/Becker_RNASeq/data/KEGG/profiles/K00637.hmm not found (nor an .h3m binary of it)

Some troubleshooting was done to check this error, you can search the profiles folder to see if a .hmm file was extracted correctly:

how to check for file extraction in profiles folder:

$ ls profiles/K00637*

ls: cannot access 'profiles/K00637*': No such file or directory

how to extract .hmm file from profiles.tar.gz if they do not appear initially

$ tar tf profiles.tar.gz | grep K00637

profiles/K00637.hmm

We were able to figure out that another way to download and extract all of the HMM files was to use wget:

wget ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz

Make sure to unzip the profiles.tar.gz after:

tar -xf profiles.tar.gz > profiles

After I solved this issue, I got another error in the slurm output that read:

Error: Unknown KO: K00960

This error means that for some reason when extracting/downloading the Kofam database, some of the values (K00960) and a few others that had not been updated were still being found in the tmp > tabular output folder that is created during the download/extraction.

I used the code below to see the older versions of the values that are no longer used and then had to delete them before running the job again.

ls -ltr tmp/tabular/  | grep -v Aug.19

total 232797
drwxr-xr-x 3 danielle_becker putnamlab   4096 Jul 16 12:03 ..
-rw-r--r-- 1 danielle_becker putnamlab   2238 Jul 19 13:34 K00960
-rw-r--r-- 1 danielle_becker putnamlab  26732 Jul 19 13:36 K02471
-rw-r--r-- 1 danielle_becker putnamlab  18298 Jul 19 13:52 K09311
-rw-r--r-- 1 danielle_becker putnamlab  21632 Jul 19 13:53 K09376
-rw-r--r-- 1 danielle_becker putnamlab   1654 Jul 19 13:53 K09394
-rw-r--r-- 1 danielle_becker putnamlab   3422 Jul 19 14:12 K18122

These values did not appear to be in the new profiles.tar.gz file, so they had to have come from a previous version.

I removed them and re-ran the job.

daniellembecker commented 3 years ago

If you are still getting an error, make sure to re-download the ko_list.gz as well before re-running

curl -O ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz | gunzip > ko_list #download and unzip KO database

Sometimes the gunzip on the end does not work, make sure to check that the ko_list has contents in it. If not, use:

gunzip ko_list.gz in a separate line to properly unzip the gz file.

After all of these steps, I re-ran the script and it worked! You should have a Pver_KO_annot.tsv file that looks like this:

Screen Shot 2021-08-23 at 2 29 15 PM