MrOlm / drep

Rapid comparison and dereplication of genomes
249 stars 37 forks source link

FastANI file not found error #210

Closed kevinmyers closed 11 months ago

kevinmyers commented 11 months ago

I am running an issue with dRep. I am getting a file not found error:

FileNotFoundError: [Errno 2] No such file or directory: 'path/dRep_output/data/fastANI_files/fastANI_out_grgbmygwix'

I am running the following command:

dRep dereplicate ./dRep_output/ -g ./*fasta -conW 0.5 -N50W 5

I updated to the newest version of dRep this morning. I was using 3.2.2 and it worked fine. All the dependencies pass.

dRep version 3.4.5 fastANI version 1.31 mash version 2.3 nucmer version 4.0.0rc1 checkM version 1.2.2 prodigal version 2.6.3 centrifuge version 1.0.4

I'm running this on CentOS7 and everything was installed via Anaconda in it's own environement.

Any thing I can do? I can downgrade to 3.2.2 but thought I'd see if there's anything else I could do.

MrOlm commented 11 months ago

Try and just run it again in a different directory. Sometimes fastANI fails in a random way that can lead to errors like that

kevinmyers commented 11 months ago

Ok. I'll give it a try and let you know what happens. Thanks!

kevinmyers commented 11 months ago

@MrOlm I ran the dRep in a different directory and I got the same error.

kevinmyers commented 11 months ago

To update. I have made a new Anaconda environment with fresh installs of dRep and dependencies and run it in multiple directories, all ending with the same File Not Found error. I have also tried this in another Anaconda environment that has worked in the past for another researcher and gotten the same error. I have had another researcher use the same Anaconda environment to test his completely different FASTAs and he has gotten the same fastANI error.

The only folder within the data/fastANI_files directory is a tmp folder with a genomeList file with two genomes (I assume the first cluster to test) in it.

I'm at a loss of what to try or why this isn't working. Any ideas?

MrOlm commented 11 months ago

Hi @kevinmyers - sorry this is happening. You could try 1) running with -d (debug mode) will tell you the error that fastANI is throwing, which will help troublehsoot the problem. The error will be located with the log folder. 2) If you don't have too many genomes, you could always use another algorithm besides fastANI.

If you go with route 1 I'm happy to help continue troubleshooting.

Best, Matt

kevinmyers commented 11 months ago

Thanks for the help!

I ran the command with the -d mode. Again it failed at the fastANI step. I'm attaching what I think are the relevant log files.

Any advice would be appreciated! logger.log.txt 2023-10-20_10.23.41.000924.STDOUT.txt 2023-10-20_10.23.41.000924.STDERR.txt 2023-10-20_10.23.41.000924.CMD.txt

MrOlm commented 11 months ago

Hi Kevin,

Interesting.

1) Do you have what dRep reported as it crashed? If so could you paste it here?

2) Are there any files in /mnt/bigdata/linuxhome/kmyers/Kevin_W_Project/20231010_metagenomics/updated_all_fasta_files/dRep_2/dRep_output/data/fastANI_files/? If so what are they?

2) What happens when you run the command /home/GLBRCORG/kmyers/anaconda3/envs/dRep_env/bin/fastANI --ql /mnt/bigdata/linuxhome/kmyers/Kevin_W_Project/20231010_metagenomics/updated_all_fasta_files/dRep_2/dRep_output/data/fastANI_files/tmp/genomeList --rl /mnt/bigdata/linuxhome/kmyers/Kevin_W_Project/20231010_metagenomics/updated_all_fasta_files/dRep_2/dRep_output/data/fastANI_files/tmp/genomeList -o /mnt/bigdata/linuxhome/kmyers/Kevin_W_Project/20231010_metagenomics/updated_all_fasta_files/dRep_2/dRep_output/data/fastANI_files/fastANI_out_qfbuwwgzhk --matrix -t 6 --minFraction 0? Does it make the file /mnt/bigdata/linuxhome/kmyers/Kevin_W_Project/20231010_metagenomics/updated_all_fasta_files/dRep_2/dRep_output/data/fastANI_files/fastANI_out_qfbuwwgzhk? If so, what does that file look like?

-Matt

kevinmyers commented 11 months ago

Hi Matt,

  1. I am attaching what was printed to the StdOut (StdOut_Crash.txt)
  2. The only file in that is a genomeList file (attached) in a tmp directory
  3. I ran the command no file was made. I'm attaching the StdOut for this (fastANI_StdOut.txt) fastANI_StdOut.txt genomeList.txt StdOut_Crash.txt
MrOlm commented 11 months ago

Ahhhh I see now. At the very end of fastANI_StdOut.txt you can see the line Segmentation fault, which means that program is crashing in a way that doesn't provide any guidance on why.

It could be that when you upgraded dRep you also upgraded fastANI to another version that is more buggy? I'm not sure.

FastANI version 1.3 tends to work for me, so maybe try installing that version?

Or if that doesn't work, maybe there's something funky with the genome files you're using that fastANI doesn't like? (weird characters in the .fasta headers, using compressed files but they don't end in .gz, etc.).

You also might have luck asking the fastANI developer if they have any suggestions?

Best of luck troubleshooting and sorry this is happening to you!

Matt

kevinmyers commented 11 months ago

Thanks Matt! I appreciate all your help! I will try to downgrade to fastANI 1.3 and see if that works.

I will also reach out to fastANI.

kevinmyers commented 11 months ago

I thought I would just post that I got it to work with the following versions in an Anaconda environment:

dRep v3.4.5 fastANI v1.31 checkm v1.2.2 prodigal v2.6.3 mash v2.3 mummer v 4.0.0rc1

It ran on 248 genomes successfully! Thank you @MrOlm for all your help and advice!

MrOlm commented 11 months ago

Thanks for posting your solution @kevinmyers !