Closed Hocnonsense closed 6 days ago
Hello @Hocnonsense,
You are correct, -contig is for genomes or assembly contigs (nucleotides). ORFs are predicted using Prodigal and then the encoded proteins are compared against CARD's reference sequences using BLASTP (including secondary screening for AMR-conferring amino acid substations). rRNA genes are also examined for key SNPs.
For the -protein option, you provide a FASTA file of protein sequences (i.e. your own proteome predictions) and these are compared against CARD's reference sequences using BLASTP as above, including screening for key substitutions.
Documentation for RGI main can be found here as it has the details: https://github.com/arpcard/rgi/blob/master/docs/rgi_main.rst
Thanks for your kind reply!
Is it possible to make rgi use nucleotide sequence of gene as input (such as output of prodigal -n), which is not contig but also not translated proteins? I think it can be usefull.
Also, you mentioned "rRNA genes", is it refers to such as 16S rRNA? if so, nucleotide sequence of gene does not contain this information, is it possible to add param to parse the gene region to rgi (e.g., a gff file?)
Yes, you can use -contig with a FASTA of ORFs. Prodigal will still run, but it will recognize the ORFs.
For rRNA, CARD does indeed have curated data for mutations in 16S and 23S rRNA. With -contig, RGI uses BLASTN to find the rRNA genes in your data and then checks for SNPs.
Yes, you can use -contig with a FASTA of ORFs. Prodigal will still run, but it will recognize the ORFs.
For rRNA, CARD does indeed have curated data for mutations in 16S and 23S rRNA. With -contig, RGI uses BLASTN to find the rRNA genes in your data and then checks for SNPs.
Thanks! So I should always first try to run with contigs.
What I want is to relpace default prodigal annotation inside rgi with my own (or publicated database) gene annotation. I hope this suggestion can make rgi better, but I'll also try something to overcome it.
Ok, great. You can run your own predicted proteins for now, but if you would be willing to share an example input file with us at card@mcmaster.ca, we can look at supporting your type of data. I'll close this issue for now.
Thanks for this usefull tool, I'm going to use it to annotate my sequence.
However, I'm puzzled about the params
[-t {contig,protein}]
, it seems that incontig
mode, input will treated as nucleotide, and firstly annotated to CDS using prodigal/prodigal; and inprotein
mode, input should be amino acid. If I usingprotein
mode, I can only provide protein and cannot apply "mutation screening depending upon the detection model type".I'm interesing about the mutation results, but I also make excessive demands to use existing gene prediction results, is it possible?
Thanks for your help!