andrewgull / MGERT

Mobile Genetic Elements Retrieving Tool
15 stars 5 forks source link

IndexError: list index out of range #3

Closed GunzIvan28 closed 4 years ago

GunzIvan28 commented 4 years ago

Hi Andrew, I have run the pipeline but i get a trace back in the code snippet below for all my samples: "Traceback (most recent call last): File "../../miniconda3/envs/rMAP-1.0/config-files/MGERT.py", line 1732, in l=args.min_length, e=args.e_value, c=args.start_codon, strnd=args.strand, g=args.genetic_code, le=args.left_end, re=args.right_end, rm_tab=args.rm_table) File "../../miniconda3/envs/rMAP-1.0/config-files/MGERT.py", line 1398, in pipe rmodeler(genome_file, threads) File "../../miniconda3/envs/rMAP-1.0/config-files/MGERT.py", line 1239, in rmodeler repmod_outfile = glob.glob("RM*/consensi.fa.classified")[0] IndexError: list index out of range"

I am running it for .fna.gz files, both strands and would like to obtain the output files. How could i overcome this

andrewgull commented 4 years ago

Hi! My first guess is that RepeatModeler output file is not generated. Can you check it?

GunzIvan28 commented 4 years ago

Hey, Yes it fails after RepeatModeler round3 in a folder labelled 'RM_26613....'. What could be my way out for a successful run? The .fna was generated after annotating my genome with prokka

andrewgull commented 4 years ago

What do you use as input, is it a genome in fasta format?

GunzIvan28 commented 4 years ago

Hi Andrew, ERR987781.zip

That is the link to the file i used. According to the instructions, the pipeline required a .fna file which i acquired after doing annotation. However, I also have a an assembly file in fasta format that i obtained from de novo assembly. Coud using the latter resolve the issue?

andrewgull commented 4 years ago

Yes, you should use the genomic assembly as input.

GunzIvan28 commented 4 years ago

Hey, I have tried to re-run it with a fasta file from assembly and it gives a new error..Below is a snippet from the trace back

run RepeatModeler on 8 CPUs. Command: /home/ivan/miniconda3/envs/rMAP-1.0/bin/RepeatModeler -engine ncbi -pa 8 -database ERR987781.fa.db > RepMod.out Missing /home/ivan/miniconda3/envs/rMAP-1.0/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq! Please rerun the configure program in the RepeatModeler directory before running this script. RepeatModeler finished Traceback (most recent call last): File "miniconda3/envs/rMAP-1.0/config-files/MGERT.py", line 1732, in l=args.min_length, e=args.e_value, c=args.start_codon, strnd=args.strand, g=args.genetic_code, le=args.left_end, re=args.right_end, rm_tab=args.rm_table) File "miniconda3/envs/rMAP-1.0/config-files/MGERT.py", line 1398, in pipe rmodeler(genome_file, threads) File "miniconda3/envs/rMAP-1.0/config-files/MGERT.py", line 1239, in rmodeler repmod_outfile = glob.glob("RM*/consensi.fa.classified")[0] IndexError: list index out of range

GunzIvan28 commented 4 years ago

I can share the assembly file too, you make a run and see what the issue could be.

Maybe some specific directions to look into: "Missing /home/ivan/miniconda3/envs/rMAP-1.0/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq!" was an error i have no idea how to overcome

"Please rerun the configure program in the RepeatModeler directory before running this script." why do i have to re-run the configure yet everything set up successfully initially

andrewgull commented 4 years ago

Apparently there are some problems with RepeatMasker installation. BTW, have you tried to run MGERT in test mode after installation and configuration?

GunzIvan28 commented 4 years ago

Below is the output fro the test run: Run MGERT on small dataset, it may take a while... MGERT will create a directory for test run in /home/ivan Correspondence table is found and added to the config... A list of smp files has been compiled. Database name - CD Local Conserved Domain Database is made and added to the config. 1/5. Starting RepeatModeler pipeline on 8 CPUs Building RepeatModeler database. Command: /home/ivan/miniconda3/envs/rMAP-1.0/bin/BuildDatabase -name test_scaffold.fasta.db -engine ncbi ref.fa Building database test_scaffold.fasta.db: Reading ref.fa... Number of sequences (bp) added to database: 1 ( 4176476 bp ) run RepeatModeler on 8 CPUs. Command: /home/ivan/miniconda3/envs/rMAP-1.0/bin/RepeatModeler -engine ncbi -pa 8 -database test_scaffold.fasta.db > RepMod.out Missing /home/ivan/miniconda3/envs/rMAP-1.0/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq! Please rerun the configure program in the RepeatModeler directory before running this script. RepeatModeler finished Traceback (most recent call last): File "/usr/local/bin/MGERT.py", line 1668, in pipe(genome_file="test_scaffold.fasta.gz", mge_type="CR1", threads=multiprocessing.cpu_count()) File "/usr/local/bin/MGERT.py", line 1398, in pipe rmodeler(genome_file, threads) File "/usr/local/bin/MGERT.py", line 1239, in rmodeler repmod_outfile = glob.glob("RM*/consensi.fa.classified")[0] IndexError: list index out of range

I believe this tool is a very solid software, it just has a lot of tweaks from the installation to this part. Kindly help me work it out as i am not so good fixing python bugs. Also the instructions should be clear to use a de novo assembly fasta file; .fna is somewhat misleading

andrewgull commented 4 years ago

Well, the test failed because of RepeatMasker/RepeatModeler installation - RepeatModeler complains about missing library. It's not a python bug. Have you run the RepeatModeler configuration script? If not then you have to do that, else try to run it again.