AnantharamanLab / PropagAtE

Prophage Activity Estimator
GNU General Public License v3.0
25 stars 3 forks source link

Dereplicated viral contigs as input #8

Open asierFernandezP opened 1 year ago

asierFernandezP commented 1 year ago

Hi,

I was wondering whether you recommend to dereplicate and filter viral contigs identified by viral sorting tools like VIBRANT, VirSorter2, etc. before running PropagAtE or if you recommend to use the viral contigs without any dereplication step as input (you recommend to use the VIBRANT coordinates file - or a manually generated one - as input for PropagAtE, so I understand that you use directly the viral contigs identified by VIBRANT without any additional steps).

Thanks in advance,

Asier

KrisKieft commented 1 year ago

Hi,

It depends on the data that you have. If you have viruses from a single sample then the assembly step should have incorporated a sufficient level of dereplication and you should be good to run Propagate. If you have multiple samples pooled together then you may have viruses with 100% similar sequences. If you have identical pooled viruses then it can mess with the mapping step and I'd suggest dereplicating or mapping separately.

asierFernandezP commented 1 year ago

In this case I have multiple samples, so I will pool the viral contigs, dereplicate them and use them as input for Propagate.

Thanks a lot!