JBerthelier / PiRATE

PiRATE (Pipeline to Retrieve and Annotate Transposable Elements)
http://doi.org/10.17882/51795
18 stars 5 forks source link

The PiRATE paper is online on BMC Genomics :

https://doi.org/10.1186/s12864-018-4763-1


Where to download the PiRATE VM (Virtual Machine)?

Pirate-Galaxy is installed on a Virtual Machine:

The PiRATE Virtual Machine can not be download on Github but on SEANOE (Sea scientific open data publication):

http://doi.org/10.17882/51795

You have to download the Virtual Machine (17 Go) and to run it on your local computer with a software such as VirtualBox (Virtual Machine Monitor) , it works on Linux, Windows, MacOS :

https://www.virtualbox.org/

The PiRATE tutorial is available here:

http://archimer.ifremer.fr/doc/00412/52373/


What is PiRATE-Galaxy?

PiRATE-Galaxy is a web-based platform which integrates bioinformatic tools dedicated to transposable elements analayses.

Thoses tools are automated into a stand alone Galaxy https://galaxyproject.org/use/pirate/ and installed in a Linux Virtual Machine (Fig. 1) (download link below).

PiRATE-Galaxy combines 14 tools allowing the detection, classification and annotation of Transposable Elements from a genome assembly and/or short reads sequencing data (e.i. Illumina) (Fig. 1).

You can use all tools in order to performed the full PiRATE pipeline workflow, or use only some tools regarding to your available input data, or your goals.

Keep in mind that the full PiRATE pipeline workflow (Fig. 2) is not "one-click" automated and that you will must run each tool one after one. For some steps, it is required to be perform manual curation (e.i. TE library curation).


Fig. 1: Galaxy-PiRATE web-based platform overview


Fig 2: Full PiRATE pipeline workflow pipeline overview


STEP I) TE Detection

Approach 1: Similarity-based

Approach 2: Structural-based

Approach 3: Repetitiveness-based

Approach 4: Build repeated elements

STEP II) TE Classification

STEP III) TE Annotation


What is the goal of PiRATE (Pipeline to Retrieve and Annotate Transposable Elements)?

To date, genome assembly of non-model organisms is usually not at chromosomal level and are highly fragmented. This fragmentation is recognized to be, in part, the result of a bad assembly of the transposable elements (TEs) copies, increasing the difficulty to detect and annotate them.

In this context, we designed a new bioinformatics pipeline named PiRATE to detect, classify and annotate TEs of non-model organisms. We optimized its detection step by gathering every existing TE detection approaches. The goal is to promote the detection of complete TE sequences of every TE families. The detection of complete TE sequences, bearing recognizable conserved domains or specific motifs, allows to facilitate the classification step.

Each tools used by the PiRATE pipeline are automated into a stand-alone Galaxy. This PiRATE-Galaxy can be used through a Virtual Machine (PiRATE-VM).

This PiRATE-Galaxy is a suitable and flexible platform to study TEs in the genome of every organisms.

However, be aware that PiRATE has been designed for organisms that have relative small genome assembly, it has been created/controled using A. thaliana genome assembly (120Mb). You need a powerful machine if you want to use it with a larger genome assembly and you have to properly setup the amount of rams/cores in the setting of virtualbox and in the virtual machine (Please check : https://github.com/JBerthelier/PiRATE/issues/29)


Projects that used PiRATE: