DerKevinRiehl / transposon_annotation_reasonaTE

Transposon annotation tool "resonaTE" (part of TransposonUltimate)
GNU General Public License v3.0
16 stars 1 forks source link

Does the programe use the TransposonDB to annotate the TEs? #8

Open bioinformaticspcj opened 2 years ago

bioinformaticspcj commented 2 years ago

Dear the authors:

Thanks for developing such an useful software. I have an question that if the reasonaTE uses TransposonDB to annotate the TEs or only the RFSB uses the TransposonDB?

Thanks again. Sincerely, Bob

DerKevinRiehl commented 2 years ago

Dear Bob, thank you very much for your interest in our software.

To your question:

Makes sense? Hopefully this answer could help you, please do not hesitate for further clarifying questions. Thanks for your answer, Best regards, Kevin Riehl

bioinformaticspcj commented 2 years ago

Dear Kevin Riehl,

Thanks for your timely and useful answers. I have another question that are the annoated TEs in FinalAnnotations_ files filtered by removing all redundant elements or just merged simply?

Thanks very much for reading. Best, Bob

------------------ 原始邮件 ------------------ 发件人: "DerKevinRiehl/transposon_annotation_reasonaTE" @.>; 发送时间: 2021年11月26日(星期五) 晚上10:56 @.>; @.**@.>; 主题: Re: [DerKevinRiehl/transposon_annotation_reasonaTE] Does the programe use the TransposonDB to annotate the TEs? (Issue #8)

Dear Bob, thank you very much for your interest in our software.

To your question:

RFSB is a transposon classification software, it uses TransposonDB.

reasonaTE is a transposon annotation software, and the many different tools called by reasonaTE use different databases (e.g. RepeatMasker and RepeatModeler use their own). At the end, after calling all annotation tools there are resulting annotations. Consecutively, reasonaTE calls RFSB for classification of all the annotations.

Makes sense? Hopefully this answer could help you, please do not hesitate for further clarifying questions. Thanks for your answer, Best regards, Kevin Riehl

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

DerKevinRiehl commented 2 years ago

Dear Bob, in fact the annotations in "Final Annotations" are not simply merged.

There is a whole pipeline that tries to remove duplicates or merge annotations together where feasible. The procedure and further details can be found in the paper / preprint of our paper. https://www.biorxiv.org/content/10.1101/2021.04.30.442214v1

"After running the annotation tools, additional copies of the identi ed transposons are searched using the clustering tool CD-HIT (v4.8.1) [74,75] and BLASTN (v2.10.1). "

So to make it simple, these aspects are conducted during the pipeline:

Hope this could help a little, Best regards, Kevin RIehl

bioinformaticspcj commented 2 years ago

Dear Kevin Riehl,

Thanks for your timely reply. Now I try to annoate a genome with 1.8Gb in size, however, I found the program run very slowly especially the must and the NCBICDD1000 (running more than two weeks). If it is better to use multiple threads to save time?  In the NCBICDD1000 python script, the rpstblastn program was not fed on multiple threads with "-num_threads" parameter, I just manually added it to accelerate. 

Thanks again Best, Bob

------------------ 原始邮件 ------------------ 发件人: "DerKevinRiehl/transposon_annotation_reasonaTE" @.>; 发送时间: 2021年12月2日(星期四) 上午6:11 @.>; @.**@.>; 主题: Re: [DerKevinRiehl/transposon_annotation_reasonaTE] Does the programe use the TransposonDB to annotate the TEs? (Issue #8)

Dear Bob, in fact the annotations in "Final Annotations" are not simply merged.

There is a whole pipeline that tries to remove duplicates or merge annotations together where feasible. The procedure and further details can be found in the paper / preprint of our paper. https://www.biorxiv.org/content/10.1101/2021.04.30.442214v1

"After running the annotation tools, additional copies of the identi ed transposons are searched using the clustering tool CD-HIT (v4.8.1) [74,75] and BLASTN (v2.10.1). "

So to make it simple, these aspects are conducted during the pipeline:

All transposon annotation tools are applied.

Duplicate annotations are removed.

Additional copies of the annotated transposons are searched using BLASTN.

Duplicate annotations are removed and merged.

Annotations are filtered.

Hope this could help a little, Best regards, Kevin RIehl

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

DerKevinRiehl commented 2 years ago

Dear Bob, that is a great suggestion.

Did your result generate output at the end?

Thanks, Kevin

bioinformaticspcj commented 2 years ago

Dear Kevin,

Thanks for your timely reply. Unfortunately, for saving CPUs to run other programes, I had to killed the program. So, in the end, the results were not reported. I will try it again in future.

Thanks, Bob

------------------ 原始邮件 ------------------ 发件人: "DerKevinRiehl/transposon_annotation_reasonaTE" @.>; 发送时间: 2022年1月16日(星期天) 晚上7:35 @.>; @.**@.>; 主题: Re: [DerKevinRiehl/transposon_annotation_reasonaTE] Does the programe use the TransposonDB to annotate the TEs? (Issue #8)

Dear Bob, that is a great suggestion.

Did your result generate output at the end?

Thanks, Kevin

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

roperete commented 1 year ago

Dear all,

Reading this issue I wonder, is there a way of manually adjusting the CD-HIT parameters on the pipeline?

Thanks!

DerKevinRiehl commented 1 year ago

Hey Alvaro, so far there is not, however what you could do is change the software code yourself and add parameters to the respective lines.

The only calls of cd-hit take place in TransposonClustering.py https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE/blob/fbbfcfabe85ba3396be462cce0f3eb96718061cd/Code/TransposonClustering.py

If you want to better understand how the pipeline works, please find the steps here: https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE/blob/cc04a2db30c98f21981eb2d90710887be726bfcc/Code/TransposonAnnotator.py#L144

If you need help to change the code on your machine, drop me a message and we can have a call. (Hint: You need to navigate to your conda folder, where reasonaTE was installed, there you should be able to find the python files and you can edit them, I recommend to redo any edits after you played around a little or just to reinstall reasonaTE in case it doesnt run anymore^^)

Best, Kevin