Open ManuelSokolov opened 7 months ago
Hi,
Thanks for using our tools.
Of course, you can concat the sequences and classify them as an entire genome. Based on the algorithm design, this should not affect the prediction a lot I suppose (But we did not test it before).
Rather than using the multiple sequence alignment, maybe you should run cdhit or mmseq2 to check whether they are redundant sequences. Then, you can choose the representative sequences as your final genome.
if you have multiple sequences for one phage, you can also run the program on all of them and use the weighted major voting for the final prediction. To be specific, you can use the provided score as the weight and the prediction of each sequence for the vote.
Hope this information will help.
Best, Jiayu
Hi Kennth, thank you for creating the tool and for the response.
I will test the options and let you know the result.
Best Regards,
Manuel
Hi,
I am classifying Phages according to their taxonomy. I am having the issue that my fast afiles have fractions of the genome instead of the whole genome:
My fasta files have the format Genome1.fasta:
The PhaGCN classifies this phage genome as:
So it gives two different classifications (yes one has probability 1.0 in other cases there isn't one with higher probability). The examples for this tool use whole genomes to classify the genome.
Best Regards and Thank You