Closed quliping closed 4 weeks ago
Thanks for sharing the data!
One possible cause for this discrepancy is that geNomad uses the mask
option when performing gene prediction (see here). Another potential explanation is that if proviruses were detected in some of your sequences, the host-encoded genes within those proviral regions were removed, leading to a smaller number of predicted genes in the geNomad output.
Thanks for sharing the data!
One possible cause for this discrepancy is that geNomad uses the
mask
option when performing gene prediction (see here). Another potential explanation is that if proviruses were detected in some of your sequences, the host-encoded genes within those proviral regions were removed, leading to a smaller number of predicted genes in the geNomad output.
Thanks for your kindly reply. I tested pyrodigal-gv again with the '-m' option, and finally I got the same protein prediction result as the genomad output.
Hello, genomad is a good software and very helpful for my work. However, I got some strange problems. I tested genomad v1.8.0 using a small data set containing 1269 contigs. In the 'final_overlapped_virus_annotate' folder of genomad's outputs, there are 5880 proteins in the 'final_overlapped_virus_proteins.faa' file. However, I got 5885 proteins from the same 1269 contigs using the pyrodigal-gv in the conda environment of genomad. My command is 'pyrodigal-gv -p meta -i final_overlapped_virus.fasta -a final_overlapped_virus-pyrodigal-gv_single.faa -o pyrodigal-gv.out'. Among which 'final_overlapped_virus.fasta' is the test data containing the 1269 contigs. I found there are 41 different protein ids between genomad and pyrodigal-gv results. May I ask if genomad uses some special parameters of pyrodigal-gv?
Here is the test data containing the 1269 contigs: final_overlapped_virus.zip