biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

Save output of "gene_markers_extraction_rec" compressed in order to avoid very large output files #3

Closed alexhbnr closed 4 years ago

alexhbnr commented 4 years ago

Hi,

When running PhyloPhlAn on large eukaryotic genomes, the output of the function gene_markers_extraction_rec generates very large output files, when enabling the option frameshifts. Since the files are generated and read by Python code inside PhyloPhlAn, it would only require replacing the open with bz2.open and would avoid excessive disk usage.

Best, Alex

fasnicar commented 4 years ago

Hi Alex,

Thanks for your suggestion. That might not be the only place where we can do that and I agree with you that it will help reduce the disk usage. I'll add this in the next release.

Many thanks, Francesco

fasnicar commented 4 years ago

Hi Alex,

with commit 93ec97a14020bb0750b67bd28ac4fab8affc5928 I added the use of bz2 when possible in PhyloPhlAn. This is not yet available in Bioconda, as for releasing a new package I would like to wait a bit more to collect more fixes.

Many thanks, Francesco

alexhbnr commented 4 years ago

Cool, thanks for changing that so fast! Alex