EBI-Metagenomics / genomes-catalogue-pipeline

MGnify genome analysis pipeline
Other
97 stars 21 forks source link

dereplication for large catalogues #63

Closed mberacochea closed 10 months ago

mberacochea commented 10 months ago

This branch has the necessary changes to support large catalogues, those that dRep can't handle due to the vast amount of memory it requires.

It follows an iterative dereplication in batches and then merges the batches.

The unit-test for the workflow is not working at the moment; nf-test and the snippet of code used for batching the genomes are not playing along.