MrOlm / drep

Rapid comparison and dereplication of genomes
264 stars 38 forks source link

Failure to plot secondary clustering plots #202

Open etd530 opened 1 year ago

etd530 commented 1 year ago

Dear developers,

When I run dRep with the -nc parameter set to 0.6, the program generates most output correctly but the PDF files of the secondary clustering dendrograms and MDS cannot be opened. In the log file it says:

07-24 14:36 INFO     Plotting secondary dendrograms
07-24 14:36 INFO     Failed to make plot #2: invalid literal for int() with base 10: '19.21'
07-24 14:36 INFO     Plotting MDS plot
07-24 14:36 INFO     Failed to make plot #3: invalid literal for int() with base 10: '19.21'

The exact command is:

dRep dereplicate dRep_comparison.s_ani_099.tertiary_clust.MAF_60/ -g *.fna --S_ani 0.99 --S_algorithm fastANI --run_tertiary_clustering -nc 0.6

Strangely, when I set the parameter to 0.5 (or leave it at the default value), all plots are correctly generated. Could this be a bug? Or am I missing something?

I am running dRep version 3.4.3 on Ubuntu 18.04.

Thanks in advance!

Eric

MrOlm commented 1 year ago

Hi Eric,

Intersting. I think this bug has to do with an incompatibility with those plots and "run_tertiary_clustering". To be sure, would you mind uploading the Cdb.csv file from the run that failed to make the plot?

Best, Matt

etd530 commented 1 year ago

Hi Matt, Sure, here it is: Cdb.csv

Best, Eric

MrOlm commented 1 year ago

OK- it is definitely the case that the tertiary_clustering is causing this problem. I will fix it in the next dRep update.

If you'd like to make the plots now, a hack to do this would be to edit Cdb.csv and rename all secondary clusters with points in them (e.g. 2_19.21) to remove the points (e.g. 2_1921).

Thanks for bringing this to my attention!

MO

etd530 commented 1 year ago

Thanks for the help Matt! I would like to make the plots but I don't understand how. If I edit the Cdb.csv and run:

dRep dereplicate dRep_comparison.s_ani_099.tertiary_clust.MAF_60/ --S_ani 0.99 --S_algorithm fastANI --run_tertiary_clustering -nc 0.6

It overwrites the file again and the points are already there. Am I missing something? Is there an option to just make the plots?

Thanks!

Eric

MrOlm commented 1 year ago

My apologies Eric! In pervious versions of dRep it was possible to run the plots on already-completed data. I forgot that I removed that functionality a few updates ago.

For now, the only way to make these plots is to remove the --run_tertiary_clustering flag. To achieve the exact same effect, you can run dRep twice with the same arguments. The first time include the genomes you're including now, and the second time just run it on the output of the first run (the dereplicated_genomes folder). That second run is all --run_tertiary_clustering does, and running it twice in this way will achieve the same effect.

Sorry about the hacky solution while I work on a real update

-MO

etd530 commented 1 year ago

Thanks Matt! Don't worry, the hacky solution is fine. Also the Cluster_scoring.pdf plots are fine, so I can check the clusters there too. Eric