RasmussenLab / phamb

Downstream processing of VAMB binning for Viral Elucidation
MIT License
46 stars 8 forks source link

modified header names in PHAMB #41

Open ShailNair opened 2 years ago

ShailNair commented 2 years ago

Hi,

My assembled contigs have headers as

c_000003956504 c_000004841845 c_000004821562

which matches with the VAMB bin headers. But when I run PHAMB, I get bin headers as :

1470111 816445 3021234 1094390

How to get the PHAMB contig headers in the initial VAMB bin headers format?.

joacjo commented 2 years ago

Hi Shail

Can you send me a snapshot of your clusters.tsv file? And how many samples do you have? I expect that there might a problem in the naming of your contigs.

The framework is designed to use the Sample-IDs from the header of contigs to keep track of where each viral-bin is from.

best, Joachim

ShailNair commented 2 years ago

@joacjo I used a single co-assembled contigs file for binning. Here is the snapshot of the cluster.tsv file and phamb generated fna file QQ图片20220928084217

The cluster.tsv file has 1101140 records.

I followed the How to Run - not in parallel - quick and dirty tutorial.

Thank you.

joacjo commented 2 years ago

Hi Shail

Ah I see. The names of the entries in vamb_bins.fna matches the VAMB-cluster names. Remember the bins in the .fna file are concats of the VAMB cluster sequences.

Example: In your clusters.tsv you might have a cluster with multiple contigs:

cluster contig 99999 c_000000123 99999 c_000000321

If this cluster is predicted putative viral, the resulting name in the .fna file will be: 99999

Does this make sense?

Best, Joachim

ShailNair commented 2 years ago

Thanks. that makes sense. Thanks for this very helpful tool. We could extract a three times higher number of complete viral contigs (as per CheckV's rule) with PHAMB in comparison to VirSorter2, DeepVirFinder and viralVerify.