AnantharamanLab / VIBRANT

Virus Identification By iteRative ANnoTation
GNU General Public License v3.0
142 stars 37 forks source link

IndexError: list index out of range #32

Closed ZongzhiWu closed 3 years ago

ZongzhiWu commented 3 years ago

$>cat job.6140.fat02.out

  • Traceback (most recent call last):
  • File "/lustre/home/liutang/.conda/envs/vibrant/bin/VIBRANT_annotation.py", line 144, in
  • kegg_outfile.write(str(parse[0]) + '\t' + str(parse[2]) + '\t' + str(parse[4]) + '\t' + str(parse[5]) + '\n')
  • IndexError: list index out of range
ZongzhiWu commented 3 years ago

$>cat job.6140.fat02.out

  • Traceback (most recent call last):
  • File "/lustre/home/liutang/.conda/envs/vibrant/bin/VIBRANT_annotation.py", line 144, in
  • kegg_outfile.write(str(parse[0]) + '\t' + str(parse[2]) + '\t' + str(parse[4]) + '\t' + str(parse[5]) + '\n')
  • IndexError: list index out of range
  • Traceback (most recent call last):
  • File "/lustre/home/liutang/.conda/envs/vibrant/bin/VIBRANT_annotation.py", line 1708, in
  • AMG_dict.update({str(annotations[n]):str(annotations[n+1]).replace("$~&", " ").replace('^@%','"') + '\t' + 》str(annotations[n+2]) + '\t' + str(ko_name) + '\t' + str(annotations[n+6]) + '\t' + str(pfam_name)})
  • NameError: name 'pfam_name' is not defined
  • Traceback (most recent call last):
  • File "/lustre/home/liutang/.conda/envs/vibrant/bin/VIBRANT_annotation.py", line 1708, in
  • AMG_dict.update({str(annotations[n]):str(annotations[n+1]).replace("$~&", " ").replace('^@%','"') + '\t' + str(annotations[n+2]) + '\t' + str(ko_name) + '\t' + str(annotations[n+6]) + '\t' + str(pfam_name)})
  • NameError: name 'pfam_name' is not defined
KrisKieft commented 3 years ago

Hi,

This looks like an older issue that was noted previously. Please update to the newest version v1.2.1 and it should solve this.

Kris

liupfskygre commented 3 years ago

Hi, Kris, I got a similar error

cat VIBRANT_log_annotation_AMC_all_megahit_contigs_5k.log

list index out of range
Traceback (most recent call last):
  File "/home/dell/.conda/envs/vibrant/bin/VIBRANT_annotation.py", line 151, in <module>
    kegg_outfile.write(str(parse[0]) + '\t' + str(parse[2]) + '\t' + str(parse[4]) + '\t' + str(parse[5]) + '\n')
IndexError: list index out of range
list index out of range
Traceback (most recent call last):
  File "/home/dell/.conda/envs/vibrant/bin/VIBRANT_annotation.py", line 151, in <module>
    kegg_outfile.write(str(parse[0]) + '\t' + str(parse[2]) + '\t' + str(parse[4]) + '\t' + str(parse[5]) + '\n')
IndexError: list index out of range

I use the following command to run

python3 /home/dell/.conda/envs/vibrant/bin/VIBRANT_run.py -i AMC_all_megahit_contigs_5k.fasta -t 20 -l 5000

By checking the version, I found I have v1.2.1

conda list |grep 'vibrant'
# packages in environment at /home/dell/.conda/envs/vibrant:
vibrant                   1.2.1                         1    bioconda

I am not sure if this is going to kill my run since it runs for a while.

thanks.

Pengfei

KrisKieft commented 3 years ago

Hi,

First, this may not directly kill your run but it will likely significantly effect the results. I believe that since you are running 20 threads then the number of these error messages that you get will equal the number of killed threads. So here, 1/10th of your sequences (2/20) will not be run.

Just in case you had multiple available versions of VIBRANT you can also check the log_run file and the 6th line will state the version that was specifically used for the run. This is a strange error to get with v1.2.1. Also in case you had multiple versions you can run /home/dell/.conda/envs/vibrant/bin/VIBRANT_annotation.py --version. This should check the version of the annotation auxiliary script used that is giving the error. This should also state v1.2.1. I assume this should be the location of VIBRANT_annotation.py but your anaconda download system may vary.

If you are running v1.2.1 then maybe this is due to an error with the manual database setup. Did you run the python3 VIBRANT_setup.py -test command and it said everything was good to go? There is one quick checks since this seems to be with KEGG. Does grep -c "NAME" KEGG_profiles_prokaryotes.HMM in the databases folder give the result 10033?

Finally, what version of hmmsearch do you have? You can type hmmsearch -h and look at the top of the help menu (2nd line).

Hopefully we can figure this out easily. These suggestions are just the easiest to check first but we can keep trying to figure this out if we don't find the solution quickly.

liupfskygre commented 3 years ago

Hi Kris, always detailed explanation. thanks. I check the things you said and got

/home/dell/.conda/envs/vibrant/share/vibrant-1.2.1/db/databases/VIBRANT_setup.py -test

Verifying correct dependency versions ...
Logger started. Check log file for messages and errors.

VIBRANT v1.2.1 is good to go!
See example_data/ for quick test files.
#

python3 /home/dell/.conda/envs/vibrant/bin/VIBRANT_run.py --version 
VIBRANT v1.2.1

hmmsearch -h # # HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# which hmmer is expected ??
#
grep -c "NAME" KEGG_profiles_prokaryotes.HMM 
# 10033

not sure what's going wrong. 

BTW, I changed the number of threads so the number of is not 20, match the number of error in the log. 

thanks. 
Pengfei
KrisKieft commented 3 years ago

I hadn't tested VIBRANT on hmmsearch v3.3.1 (only up to v3.3) but based on a couple tests it all looks fine.

Can you try python3 /home/dell/.conda/envs/vibrant/bin/VIBRANT_annotation.py --version and see what that gives? It should be v1.2.1. This is the specific script that is causing the error.

Also, can you send me the two HMM outputs for KEGG? You will find these in the HMM_tables_parsed and HMM_tables_unformatted folders. If these are large files they you may need to either email them (kieft@wisc.edu) or we can find another way to share the files.

Thank you for your patience helping me solve this.

liupfskygre commented 3 years ago

Hi Kris, I did not try large dataset but do with your example dataset. seems everything is fine. I have data set with a subset (100) of my big file (863567), got fine output.

DO this caused by big file?

KrisKieft commented 3 years ago

The file size should not cause the issue, but with a subset of 100 you just weren't running into the issue. Please see my previous message about the version of VIBRANT_annotation.py.

liupfskygre commented 3 years ago

hi Kris, I test the version of vibrant with

python3 /home/dell/.conda/envs/vibrant/bin/VIBRANT_annotation.py --version 

and  It is VIBRANT v1.2.1

But I am confused since you comment above that

This looks like an older issue that was noted previously. Please update to the newest version v1.2.1 and it should solve this.

now you mentioned

It should be v1.2.1. This is the specific script that is causing the error.

I am confused then which version has this specific issue, version v1.2.1 or before. Any solution on this now?

I also test a subset with 10k sequences, no error also. thanks.

KrisKieft commented 3 years ago

The previous version has the issue, not v1.2.1. That is why I am confused about why this is occurring. Maybe you could try removing VIBRANT and re-installing.