MrOlm / drep

Rapid comparison and dereplication of genomes
256 stars 37 forks source link

something about MASH #48

Closed yangfanLiu1995 closed 5 years ago

yangfanLiu1995 commented 5 years ago

Hi: I'm a green hand of bioinformatics.Now I have two eucaryotic organisms' genomes.Then I want to ask : Can I use the "dRep compare"module to get the ANI of two organisms?Or the "ANI" is meaningful for eucaryotic organisms?

Look forward to your reply!

Yangfan 2019/3/29

MrOlm commented 5 years ago

Hi Yangfan,

You can use dRep on eukaryotic organisms but there are a couple of things in mind. First, if the genomes are very incomplete, you might want to lower the ANI threshold for Mash (-pa) to 0.5 or so. Second, if you try and de-replicate these organisms you'll have to provide external estimates of completeness and contamination, as checkM will not work on eukaryotes.

I'm going to close this issue but feel free to respond if you have more questions :)

-Matt

yangfanLiu1995 commented 5 years ago

Hi Matt: Many thanks for your reply! In fact,I just want to use the compare module to caculate ANI.So I won't de-replicate these organisms. I found that the "minimum aligned fraction" is used for ANIm/gANI algorithms,and the default value is 10%.I think maybe it's suitable for procaryotic organisms, but I have no idea if this is suitable for eukaryotic organisms,too. And if there is "minimum aligned fraction" in the primary algorithm (Mash)?And which algorithm(Mash/ANIm/gANI) do you suggest for my genomes?

Look forward to your reply!

-Yangfan

MrOlm commented 5 years ago

Hi Yangfan,

The "minimum aligned fraction" just has to do with the secondary algorithm, not Mash. For a secondary algorithm, you should use ANImf (because it doesn't rely on calling genes). 10% for the minimum aligned fraction is probably fine- that value is usually either very low or very high.

If you have a small-ish number of genomes (<100), the best option would probably be to just ANImf and --skipSecondary altogether. Otherwise just use ANImf and -pa 0.5.

Best, -Matt

yangfanLiu1995 commented 5 years ago

Hi: Thanks for your help!

I compared my genomes by the default commanddRep compare /stor9000/apps/users/NWSUAF/2013130172/lyf/dRep_test/results/after_filter/filter2 -g /stor9000/apps/users/NWSUAF/2013130172/lyf/dRep_test/singlecell_fasta/after_filter/fliter_method2/* And then they performed well in the primary cluster. In the second cluster,the compared method is "ANImf" and the min cov is 0.1. But in the "Ndb" table, it performed strange in one line:

image When MDA08 compared to itself, the ANI was 0.But when others compared itself,it showed 1.

I can't understand why this one showed 0.Maybe something wrong in my command ?

Look forward to your reply.

Yangfan

yangfanLiu1995 commented 5 years ago

12

MrOlm commented 5 years ago

Hi Yangfan,

That is very odd- I don't think I've ever seen that before. There are a couple of things you can try:

1) if you look in the data/ANImf_files/ folder, you can fine the intermediate files that were used to do the comparison. Seeing if the file with that genome compared to itself is truly empty or not could lead to a clue

2) If you run the program again with --debug, it will create a log file for every ANI comparison run. If mummer is throwing an error during that comparison, this is how you would figure it out.

Best, -Matt

yangfanLiu1995 commented 5 years ago

I checked the data/ANImf_files/MDA08.contigs.min500.unmaptobac.fa foler, but it seemed very strange and different with other folders. 12

Actually, I have a stand error file which have a lot of information:


..:: dRep compare Step 1. Cluster ::..

Clustering Step 1. Parse Arguments Clustering Step 2. Perform MASH (primary) clustering 2a. Run pair-wise MASH clustering 2b. Cluster pair-wise MASH clustering 3 primary clusters made Step 3. Perform secondary clustering Running 12 ANImf comparisons- should take ~ 1.0 min Step 4. Return output


..:: dRep compare Step 2. Bonus ::..

Loading work directory


..:: dRep compare Step 3. Evaluate ::..

will provide warnings about clusters 1 warnings generated: saved to /stor9000/apps/users/NWSUAF/2013130172/lyf/dRep_test/results/after_filter/filter2/log/warnings.txt


..:: dRep compare Step 4. Analyze ::..

making plots 1, 2, 3, 4 Plotting primary dendrogram Plotting secondary dendrograms Plotting MDS plot Plotting scatterplots /stor9000/apps/users/NWSUAF/2013130172/software/miniconda_dRep/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

..:: dRep compare finished ::..

Genome comparison data............... /stor9000/apps/users/NWSUAF/2013130172/lyf/dRep_test/results/after_filter/filter2/data_tables/ Figures.............................. /stor9000/apps/users/NWSUAF/2013130172/lyf/dRep_test/results/after_filter/filter2/figures/ Warnings............................. /stor9000/apps/users/NWSUAF/2013130172/lyf/dRep_test/results/after_filter/filter2/log/warnings.txt

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

It seems that no nummer error appears.

MrOlm commented 5 years ago

Hi Yangfan,

Based on the fact that some of those files are empty (namely the ones ending in .filtered), probably something crashed related to numpy. The errors are not saved by default- they would be in a folder called cmd_logs in your log directory.

To figure out what the numpy error is. you'll need to re-run this analysis using the --debug flag, and then look through the logs in the cmd_logs directory to see what the problem is.

Best, Matt