chrisquince / STRONG

Strain Resolution ON Graphs
MIT License
44 stars 9 forks source link

cds-subgraphs #109

Open marcomeola opened 3 years ago

marcomeola commented 3 years ago

The cds-subgraphs step in spades breaks my pipeline when processing certain bins (20% of all bins) due to huge memory requirements (up to 758GB). The DESMAN pipeline within STRONG, however, managed to run till the end.

Is there a way to parallelize the cds-subgraphs process or somehow reduce the memory increase during the process? If not, is there an alternative tool that could be used for subgraph extraction that can be applied to make the STRONG pipeline run till the end?

Alternatively, I could use the DESMAN results, but from your readme file - neither on the strong nor on the desman github page - it is clear which files should be used for downstream analyses, such as Anvi'o. Are the haplotypes the fasta files in the haplotype folder or are the haplotypes the sequences within these fasta files, with about 10 haplotypes each?