annotate the position of transposase

famiji commented 1 year ago

Hello, I'm a newbie in bioinformatics. My current need is to annotate the position of transposase on a sequence. Is this tool suitable for me? And is there a tool or transposase database that meets my needs?

DerKevinRiehl commented 1 year ago

Dear ramiji, thank you for your interest in reasonaTE.

Answer: Yes it can be used for your purpose. ReasonaTE was designed primarily to annotate transposons. But, it can also be used to annotate transposase. In reasonaTE we employ the NCBI CDD Database (the largest sequence database for protein sequences) with many different sequences (including transposase) to annotate common features of transposons, such as transposase.

How to do it? You can just apply reasonaTE as described on the tutorial page, check out the example project for more details. https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE

You can do Step1.

Then, in Step2, you just need to run the subtools

reasonaTE -mode annotate -projectFolder workspace -projectName testProject -tool NCBICDD1000

Then Step 3 you can just run

reasonaTE -mode parseAnnotations -projectFolder workspace -projectName testProject

Where to find results? Then you can find the output of NCBICDD1000 in the "parsedAnnotations/NCBICDD1000.gff3" folder file: https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE/blob/main/workspace/testProject/parsedAnnotations/NCBICDD1000.gff3 As you can see, it is annotating many different sequences, in the last column you see their names. If there is transposase in your given input sequence, I am sure he will find it.

Hope this could help, and please let me know once you have any updates / progress on your project. Wish you success and looking forward hearing back from you again.

Best regards, Kevin

famiji commented 1 year ago

It seems that I didn't get the result I wanted. In fact, my goal is to annotate the IS1151 transposase on the bacterial plasmid. I input the sequence containing this transposase, but I didn't get the result of its annotation.

DerKevinRiehl commented 1 year ago

If you know exactly which transposase sequence you want, why you dont use an annotation tool to search for this specific sequence in your genome? Our tool just searches for the transposase listed in NCBICDD1000, but not your specific IS1151.

You could use a famous bioinformatician tool called "blast":

blastn -db query_sequences.fasta -out blastresult.txt -outfmt "7 sacc stitle qframe evalue bitscore qstart qend qlen sstart send" -query genome.fasta -evalue 0.1 -num_threads 10"

So you could add "IS1151 transposase" sequence into this query_sequences.fasta and see if you can find it. You can even adjust thresholds and probabilities for exact or less exact match of the sequence in the genome (using evalue).

Hope this could help,

Best, Kevin

DerKevinRiehl / transposon_annotation_reasonaTE

annotate the position of transposase #21