Using gzip or something. For the SheepGut dataset, running fdr estimate using Everything to produce one TSV file for all possible decoy contexts produces a folder of outputs weighing ~1.7 GB -- each TSV file is about 120 MB (including the number of mutations per Mb TSV file).
Looks like pd.read_csv() supports loading gzipped files (https://stackoverflow.com/a/39264156), so this shouldn't complicate things too much. Although it might make testing a bit more difficult.
Using gzip or something. For the SheepGut dataset, running
fdr estimate
usingEverything
to produce one TSV file for all possible decoy contexts produces a folder of outputs weighing ~1.7 GB -- each TSV file is about 120 MB (including the number of mutations per Mb TSV file).Looks like
pd.read_csv()
supports loading gzipped files (https://stackoverflow.com/a/39264156), so this shouldn't complicate things too much. Although it might make testing a bit more difficult.