PROBIC / mSWEEP

mSWEEP High-resolution sweep metagenomics using fast probabilistic inference
MIT License
13 stars 2 forks source link

Add option to supply multiple clusterings for the same run #7

Closed tmaklin closed 3 years ago

tmaklin commented 3 years ago

(Re)running mSWEEP with several clusterings of the reference sequences (eg. hierarchically from genus -> species -> sequence type -> lineage) is sometimes useful but currently requires rerunning the entire estimation.

Since loading in the pseudoalignments from themisto can take quite a while especially for large sequencing runs, it would be useful to have an option for estimating the abundances several times with different clusterings.

tmaklin commented 3 years ago

Functionality has been added in https://github.com/PROBIC/mSWEEP/commit/d93f26f2034d0fa4845e5e8f211de611dceed298.

Several groupings can be supplied by appending them as columns to the argument given by either the -i or the --groups-list options. The column delimiter is defined by the --groups-delimiter argument (default: tab-separated.). If there are several groupings and output to file is requested, the output will be written to the file specified by the -o argument but with the column index appended. Otherwise the results from all runs will print to cout.