RobertsLab / resources

https://robertslab.github.io/resources/
19 stars 11 forks source link

ID transposable elements in Oly Genome #265

Closed sr320 closed 6 years ago

sr320 commented 6 years ago

Use RepeatMasker to ID TEs from draft Oly genome.

Note Sean developed pipeline and ran on old version of genome...

https://genefish.wordpress.com/?s=%22transposable+elements%22

kubu4 commented 6 years ago

Ran this with four different species settings. Notebook is here:

http://onsnetwork.org/kubu4/2018/05/23/transposable-element-mapping-olympia-oyster-genome-assembly-using-repeatmasker-4-07/

Seems like most useful setting is with species set to Crassostrea gigas. Here's summary table from that run:

==================================================
file name: jelly.out.fasta          
sequences:        696946
total length: 1253001795 bp  (1172226648 bp excl N/X-runs)
GC level:         36.51 %
bases masked:  160759267 bp ( 13.71 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements       213132     69887654 bp    5.96 %
   SINEs:             2374       311974 bp    0.03 %
   Penelope         171792     57862186 bp    4.94 %
   LINEs:           195605     63430615 bp    5.41 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex        731       357995 bp    0.03 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL         13        11377 bp    0.00 %
     RTE/Bov-B        8085      1948581 bp    0.17 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:     15153      6145065 bp    0.52 %
     BEL/Pao          2119       955773 bp    0.08 %
     Ty1/Copia         101        75372 bp    0.01 %
     Gypsy/DIRS1     11776      4815361 bp    0.41 %
       Retroviral        0            0 bp    0.00 %

DNA transposons     256292     35689117 bp    3.04 %
   hobo-Activator    19847      2059651 bp    0.18 %
   Tc1-IS630-Pogo    43269      6806311 bp    0.58 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac           7935      1060296 bp    0.09 %
   Tourist/Harbinger  9503       887332 bp    0.08 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:       174943     38299211 bp    3.27 %

Total interspersed repeats:   143875982 bp   12.27 %

Small RNA:             280        78768 bp    0.01 %

Satellites:           7383      1362194 bp    0.12 %
Simple repeats:     278809     12982714 bp    1.11 %
Low complexity:      44078      2622506 bp    0.22 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs

The query species was assumed to be crassostrea gigas
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127

run with rmblastn version 2.6.0+
sr320 commented 6 years ago

Could you repeat this with Olurida_v081.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa this is the version we will go forward with

https://github.com/RobertsLab/resources/wiki/Genomic-Resources#genome-1

kubu4 commented 6 years ago

This should be done by now, but, for some reason, not getting expected GFF output files. :angry:

Re-running...

kubu4 commented 6 years ago

Repeated with genome file linked above.

Notebook: