Closed daniellembecker closed 3 years ago
If you are still getting an error, make sure to re-download the ko_list.gz as well before re-running
curl -O ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz | gunzip > ko_list #download and unzip KO database
Sometimes the gunzip on the end does not work, make sure to check that the ko_list has contents in it. If not, use:
gunzip ko_list.gz
in a separate line to properly unzip the gz file.
After all of these steps, I re-ran the script and it worked! You should have a Pver_KO_annot.tsv file that looks like this:
Follow step 3: map KEGG terms to a genome in @echille's lab notebook 2020-10-08-M-capitata-functional-annotation-pipeline.md post for general steps and explanations on the beginning of this process.
Also, see @echille GitHub for further KEGG ontology steps in R.
I ran into an issue when downloading and using this part of @echille code to get the up-to-date Kofam database:
curl -O ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz | tar xf > profiles #download and inflate profiles
It seemed to not download/extract on of the HMM files for some reason which I realized after running the KofamScan script, adapted from @echille original found here.
In the slurm output, this error was written out:
Some troubleshooting was done to check this error, you can search the profiles folder to see if a .hmm file was extracted correctly:
how to check for file extraction in profiles folder:
$ ls profiles/K00637*
ls: cannot access 'profiles/K00637*': No such file or directory
how to extract .hmm file from profiles.tar.gz if they do not appear initially
$ tar tf profiles.tar.gz | grep K00637
profiles/K00637.hmm
We were able to figure out that another way to download and extract all of the HMM files was to use wget:
wget ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
Make sure to unzip the profiles.tar.gz after:
tar -xf profiles.tar.gz > profiles
After I solved this issue, I got another error in the slurm output that read:
Error: Unknown KO: K00960
This error means that for some reason when extracting/downloading the Kofam database, some of the values (K00960) and a few others that had not been updated were still being found in the tmp > tabular output folder that is created during the download/extraction.
I used the code below to see the older versions of the values that are no longer used and then had to delete them before running the job again.
These values did not appear to be in the new profiles.tar.gz file, so they had to have come from a previous version.
I removed them and re-ran the job.