file exists but get os.path.isfile(genome), "{0} is not a file".format(genome) error

AprilJauhal commented 6 months ago

Hello,

I am trying to run: dRep dereplicate --genomeInfo ${drep_path}/dRep.genomeInfo -g ${drep_path}/genome_list.txt –S_algorithm fastANI –multiround_primary_clustering –clusterAlg greedy -ms 10000 -pa 0.9 -sa 0.95 -nc 0.30 -cm larger -p 10 ${drep_path}/drep_out_temp (version 3.4.5)

And I get the following error:

Traceback (most recent call last): File "/gpfs/home/ettina03/.conda/envs/drep-ettina03/bin/dRep", line 32, in Controller().parseArguments(args) File "/gpfs/home/ettina03/.conda/envs/drep-ettina03/lib/python3.7/site-packages/drep/controller.py", line 100, in parseArguments self.dereplicate_operation(vars(args)) File "/gpfs/home/ettina03/.conda/envs/drep-ettina03/lib/python3.7/site-packages/drep/controller.py", line 48, in dereplicate_operation drep.d_workflows.dereplicate_wrapper(kwargs['work_directory'],kwargs) File "/gpfs/home/ettina03/.conda/envs/drep-ettina03/lib/python3.7/site-packages/drep/d_workflows.py", line 29, in dereplicate_wrapper drep.d_filter.d_filter_wrapper(wd, genomes = genomes, Chdb = Chdb, **kwargs) File "/gpfs/home/ettina03/.conda/envs/drep-ettina03/lib/python3.7/site-packages/drep/d_filter.py", line 66, in d_filter_wrapper bdb = drep.d_cluster.utils.load_genomes(kwargs['genomes']) File "/gpfs/home/ettina03/.conda/envs/drep-ettina03/lib/python3.7/site-packages/drep/d_cluster/utils.py", line 603, in load_genomes assert os.path.isfile(genome), "{0} is not a file".format(genome)

I even get this error if I include the genome paths instead of the file to a list, and I also get this error if I just use one genome. It is odd because I have been able to get drep to work before with the same version/conda environment.

I checked and my files all exist. I even tried copying some commands from the utils.py script and I don't get the same error:

>>> import os
>>> genome_list_file="genome_list.txt"
>>> with open(genome_list_file, "r") as file:
...     genome_list = [line.strip() for line in file]
... 
>>> for genome in genome_list:
...      assert os.path.isfile(genome), f"{genome} is not a file"
... 
>>>

In the above code I also checked that the read "genome_list" was the correct length. The files all have the ending ".fa" but this hasn't been an issue in the past. The fasta headers contain ":" characters but I don't see why this would keep the file from being read. Do you have any idea what might be happening here?

MrOlm commented 6 months ago

Hi @AprilJauhal - Interesting, I haven't seen this before. That assert statement should specify the genome that it thinks is not a file in the STDOUT; is it not doing that?

AprilJauhal commented 6 months ago

Correct, that is the end of the output and it doesn't state which file is the issue.

MrOlm commented 6 months ago

could you attach the log.log file?

AprilJauhal commented 6 months ago

logger.log

MrOlm commented 6 months ago

Ah I see- it thinks -S_algorithm is a genome that it's trying to load. Not sure why argparse is doing that (maybe you just need two dashes?), but if you shift -g ${drep_path}/genome_list.txt to the end of your command it should fix the problem

AprilJauhal commented 6 months ago

Thank you. Adding the double dashes to several of the flags seems to have worked.

I was trying to use this command from the InStrain tutorial: dRep dereplicate MergedGenomeSet -g FullListOfGenomes.txt –S_algorithm fastANI –multiround_primary_clustering –clusterAlg greedy -ms 10000 -pa 0.9 -sa 0.95 -nc 0.30 -cm larger -p 16

However, "greedy" doesn't seem to be one of the clusterAlg settings:

dRep dereplicate: error: argument --clusterAlg: invalid choice: 'greedy' (choose from 'complete', 'ward', 'weighted', 'single', 'centroid', 'median', 'average')

Do you have a recommendation for which setting to use instead?

MrOlm commented 6 months ago

Sorry about that - I'd recommend average. The greedy version didn't work out well and was deprecated.

AprilJauhal commented 6 months ago

Thank you so much! It seems to be working now.

AprilJauhal commented 6 months ago

Do you know where I can find information on how to include an extra weight table in dRep?

MrOlm commented 6 months ago

Here's some quick info- lmk if you have questions: https://github.com/MrOlm/drep/issues/222

MrOlm / drep

file exists but get os.path.isfile(genome), "{0} is not a file".format(genome) error #231