Closed liangjinsong closed 2 months ago
It looks like your problem can be seen as a problem between segmented and non-segmented viruses. In our test for segmented virus, the results demonstrate that the segmented or non-segmented viruses does not have a significant impact on accuracy. If you ensure this contigs belong to one virus, we support you combine these virus contigs. After all, the longer the genome length, the more information it covers and the more accurate the predictions will be.
Thank you for your question.
I noticed that PhaGCN integrated into the online pipeline PhaBOX can pass the contigs with non-ATCG (>k142_test_contig4 within example file clear sequence).
Hi, PhaGCN seems take each contig in a fasta file as a viral genome, and outputs classification result for each contig in a fasta file. Then, an issue will occur when the input file is a viral bin, which contains several contigs of one viral genome in a fasta file. Deleting both the lines starting with '>' and line breaks (\n) in a viral bin seems a simple solution, do you think this method is reasonable? Do you have any recommends?
Thank you in advance.