MrOlm / drep

Rapid comparison and dereplication of genomes
263 stars 37 forks source link

drep stoped during clustering #225

Closed Wanli-HE closed 10 months ago

Wanli-HE commented 10 months ago

Hi!

i used drep recently, and i had met some issues.

the issue is:

image

the commond is: "dRep dereplicate vamb_drep_res -p 40 -pa 0.9 -sa 0.95 -nc 0.3 -cm larger --S_algorithm fastANI -g vamb_res_fa_file/* " version: lastest

the workflow works: """There are the columns: ['genome', 'completeness', 'contamination', 'strain_heterogeneity'] Filtering genomes 1.01% of genomes passed checkM filtering Storing resulting files


..:: dRep dereplicate Step 2. Cluster ::..

Running primary clustering Running pair-wise MASH clustering Clustering MASH database """

MrOlm commented 10 months ago

Hello,

I believe you're using an older version of dRep that has an incompatibility with newer versions of pandas; please upgrade dRep to the newest version and this error will go away.

Best, Matt

Wanli-HE commented 10 months ago

Hello,

I believe you're using an older version of dRep that has an incompatibility with newer versions of pandas; please upgrade dRep to the newest version and this error will go away.

Best, Matt

Hello,

I believe you're using an older version of dRep that has an incompatibility with newer versions of pandas; please upgrade dRep to the newest version and this error will go away.

Best, Matt

Hi Matt!

thanks, it`s works,

another question, how can i get the information about clusering of every bins, it seems like only get represent bins of each cluster. for instance, bins1, bin2 belong to which cluster.

best, wanli

MrOlm commented 10 months ago

Hello,

That information is located in the file Cdb.csv in the data_tables output folder.

Best, Matt

Wanli-HE commented 10 months ago

Hello,

That information is located in the file Cdb.csv in the data_tables output folder.

Best, Matt

hI AGAIN!

I checked that file, one thing i am not sure, i have over 3000 bins, but half of that was clustered into a group called: root (UID1), image,

is that normal? and all the paremater i used are defeault.

best, wanli

MrOlm commented 10 months ago

I’ve never seen that before. Could you let me know the parameters you ran dRep with and show me the top of that file? (Be nice to know what the headers are)

Wanli-HE commented 10 months ago

I’ve never seen that before. Could you let me know the parameters you ran dRep with and show me the top of that file? (Be nice to know what the headers are)

Hi! here is the commond line: dRep dereplicate vamb_drep_res -p 40 -pa 0.9 -sa 0.95 -nc 0.3 -cm larger --S_algorithm fastANI -g vamb_res_fa_file/*

and here is the results: Chdb.csv

the second one is using defaults parameter. but it seems like same to the first one. [Uploading Chdb (2).csv…]()

so i am a little bit confused, why the result look like that! 3400 bins but just clustered into 74 groups. especially when i used gdtbtk to annotate all bins, i get 465 in total species.

MrOlm commented 10 months ago

Ah I see- you're looking at Chdb.csv, which contains taxonomy information. The clustering information is in the file Cdb.csv

Wanli-HE commented 10 months ago

Ah I see- you're looking at Chdb.csv, which contains taxonomy information. The clustering information is in the file Cdb.csv

Hi! that will more not make sense, there only 35 represents genome, with default parameter, but gtdbtk have over 400 specise.

besides, i want to know the information of which bin belong to which cluster, of all bins. do drep really have the summary files?

best, wanli

Wanli-HE commented 10 months ago

Ah I see- you're looking at Chdb.csv, which contains taxonomy information. The clustering information is in the file Cdb.csv

Hi! that will more not make sense, there only 35 represents genome, with default parameter, but gtdbtk have over 400 specise.

besides, i want to know the information of which bin belong to which cluster, of all bins. do drep really have the summary files?

best, wanli

here is another i tried, using ANI > 0.98, but still only have 34 clusters,

image