Closed sr320 closed 6 years ago
Ran this with four different species settings. Notebook is here:
Seems like most useful setting is with species set to Crassostrea gigas. Here's summary table from that run:
==================================================
file name: jelly.out.fasta
sequences: 696946
total length: 1253001795 bp (1172226648 bp excl N/X-runs)
GC level: 36.51 %
bases masked: 160759267 bp ( 13.71 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
Retroelements 213132 69887654 bp 5.96 %
SINEs: 2374 311974 bp 0.03 %
Penelope 171792 57862186 bp 4.94 %
LINEs: 195605 63430615 bp 5.41 %
CRE/SLACS 0 0 bp 0.00 %
L2/CR1/Rex 731 357995 bp 0.03 %
R1/LOA/Jockey 0 0 bp 0.00 %
R2/R4/NeSL 13 11377 bp 0.00 %
RTE/Bov-B 8085 1948581 bp 0.17 %
L1/CIN4 0 0 bp 0.00 %
LTR elements: 15153 6145065 bp 0.52 %
BEL/Pao 2119 955773 bp 0.08 %
Ty1/Copia 101 75372 bp 0.01 %
Gypsy/DIRS1 11776 4815361 bp 0.41 %
Retroviral 0 0 bp 0.00 %
DNA transposons 256292 35689117 bp 3.04 %
hobo-Activator 19847 2059651 bp 0.18 %
Tc1-IS630-Pogo 43269 6806311 bp 0.58 %
En-Spm 0 0 bp 0.00 %
MuDR-IS905 0 0 bp 0.00 %
PiggyBac 7935 1060296 bp 0.09 %
Tourist/Harbinger 9503 887332 bp 0.08 %
Other (Mirage, 0 0 bp 0.00 %
P-element, Transib)
Rolling-circles 0 0 bp 0.00 %
Unclassified: 174943 38299211 bp 3.27 %
Total interspersed repeats: 143875982 bp 12.27 %
Small RNA: 280 78768 bp 0.01 %
Satellites: 7383 1362194 bp 0.12 %
Simple repeats: 278809 12982714 bp 1.11 %
Low complexity: 44078 2622506 bp 0.22 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
Runs of >=20 X/Ns in query were excluded in % calcs
The query species was assumed to be crassostrea gigas
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
run with rmblastn version 2.6.0+
Could you repeat this with Olurida_v081.fa : http://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081.fa this is the version we will go forward with
https://github.com/RobertsLab/resources/wiki/Genomic-Resources#genome-1
This should be done by now, but, for some reason, not getting expected GFF output files. :angry:
Re-running...
Repeated with genome file linked above.
Notebook:
Use RepeatMasker to ID TEs from draft Oly genome.
Note Sean developed pipeline and ran on old version of genome...
https://genefish.wordpress.com/?s=%22transposable+elements%22