Closed JBreunig closed 1 year ago
Hi @JBreunig,
I noticed you also posted this on the SpatialTE issues. Both tools come packed with a script that converts the *.out RepeatMasker file into the appropriate format required for the pipeline.
For SoloTE the script is named ./convertRMOut_to_SoloTEinput.sh
, whereas for SpatialTE the script name is convertRMOut_to_SpatialTEinput.sh
, and works the same.
The input for both scripts is the RepeatMasker out file. Here is an example of the mm39.fa.out RepeatMasker out file available at UCSC, and looks like this:
% head mm39.fa.out
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
3667 6.7 0.0 1.0 chr1 3050294 3050775 (192103504) + L1MdA_VI LINE/L1 6094 6570 (6) 1
242 26.5 2.2 3.9 chr1 3051045 3051146 (192103133) C ID2 SINE/ID (48) 104 5 2
13 11.6 0.0 6.7 chr1 3051160 3051191 (192103088) + (TGCT)n Simple_repeat 1 30 (0) 3
303 28.9 11.2 3.6 chr1 3051232 3051371 (192102908) + Tigger9a DNA/TcMar-Tigger 88 239 (493) 4
579 29.3 11.7 2.2 chr1 3051808 3051992 (192102287) + Tigger5b_Glire DNA/TcMar-Tigger 39 243 (110) 5
385 25.7 3.4 4.2 chr1 3052102 3052219 (192102060) + PB1D10 SINE/Alu 1 117 (0) 6
16 10.7 0.0 3.2 chr1 3052862 3052893 (192101386) + (TATCAA)n Simple_repeat 1 31 (0) 7
Then, for SoloTE you would need to run the conversion script like this, where mm39_SoloTE.bed
is the name you want for your new file:
./convertRMOut_to_SoloTEinput.sh mm39.fa.out mm39_SoloTE.bed
% head mm39_SoloTE.bed
chr1 3050294 3050775 chr1|3050294|3050775|L1MdA_VI:L1:LINE|+ 3667 +
chr1 3051045 3051146 chr1|3051045|3051146|ID2:ID:SINE|- 242 -
chr1 3051232 3051371 chr1|3051232|3051371|Tigger9a:TcMar-Tigger:DNA|+ 303 +
chr1 3051808 3051992 chr1|3051808|3051992|Tigger5b_Glire:TcMar-Tigger:DNA|+ 579 +
chr1 3052102 3052219 chr1|3052102|3052219|PB1D10:Alu:SINE|+ 385 +
chr1 3053936 3054012 chr1|3053936|3054012|L1MC4:L1:LINE|- 247 -
chr1 3054039 3054829 chr1|3054039|3054829|L1MB4:L1:LINE|- 1624 -
chr1 3054917 3055526 chr1|3054917|3055526|Lx8b:L1:LINE|- 1976 -
chr1 3055671 3055792 chr1|3055671|3055792|B1F1:Alu:SINE|- 316 -
chr1 3056082 3056260 chr1|3056082|3056260|L1MB4:L1:LINE|- 316 -
Hope this helps!
Thank you! Any plans on creating a scATAC version of this package?
I was able to find Repeatmasker files on the UCSC website but they don't seem to be in the exact same format and with the *.rm.out name. Could you provide a link? Thanks in advance! Josh