K2SOHIGH / pcalf

MIT License
1 stars 0 forks source link

Merge entry from same genomes (but different version) #23

Open GGasch opened 7 months ago

GGasch commented 7 months ago

When a genomes has several versions (for instance : GCF_003555505.1&GCA_003555505.2) should we merge the entry in the results of the analysis ?

K2SOHIGH commented 7 months ago

Don't know.

It could be an option for pcalf-datasets to only download the last version of an assembly, but you might miss some ccyA+ genomes in this way.

What could be done on the pcalf-annotation part is the addition of a dereplication steps (dRep for example or InStrain) before CheckM / GTDB-Tk reducing calculation time (??? dRep itself can take a while so ....) for those steps...

Whatever, i agree with you, merging several versions of an assembly could reduce the complexity of the final dataset.

GGasch commented 7 months ago

Could points.

Do you know whether a new version of genome means the older one is obsolete (ncbi seems to hint that), or not ? Because if it is the case we can focus only on the latest version.

GGasch commented 7 months ago

I will try to implement dRep into the tools to have a more user friendly output.