Processing PropagAtE output

Hi,

I am currently applying PropagAtE (default parameters) to gut metagenomic data and I am struggling to understand which filters to apply to the output .tsv files. I see that in the output .tsv files PropagAtE predicts a value (dormant/active) even for prophages with very low breadth of coverage. Should I use certain cut-off based on 'prophage_cov_breadth' column?

In the supplementary Table S3B of the propagAtE paper, I see that the values for this column that you report are really high for CRC or HeQ datasets, but the this goes down for other datasets. Still, you considered them as present in your analyses. Am I misunderstanding the meaning of this column? Would you recommend any kind of post-filtering of prophages after running PropagAtE with default parameters (considering that a high number of potential prophage sequences are run against the sequencing reads of each metagenomic sample, so I expect only a few of them to be present in each)?

Thank you!

AnantharamanLab / PropagAtE

Processing PropagAtE output #11