Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
230 stars 50 forks source link

¿Is it possible to run RepeatMasker with genotipes from a SNP array? #133

Closed KatherineAr closed 2 years ago

KatherineAr commented 3 years ago

What do you want to know?

Helpful context

jebrosen commented 2 years ago

Hi Katherine,

I'm not sure I fully understand this question. What kind of data or file format do you currently have (e.g. sequences or sequence coordinates) for the genotypes, and what kind of annotation or masking are you trying to perform?

KatherineAr commented 2 years ago

Thanks for answering!

My data consists on SNPs with coordinates, 500 000 SNPs aproximately. I'm trying to identify if there are LINES or SINES. I think it can't be because RM doesn't work with coordinates.

I hope you can help me. Thank you so much :)

jebrosen commented 2 years ago

Since you have coordinates, you may be able to compare the locations of the SNPs to the locations of repeats annotated by RepeatMasker - as long as the locations are all relative to the same reference sequence or if you can convert them. Some tools that can help with this include util/rmOutToGFF3.pl and bedtools intersect. Does that approach sound appropriate for your data?

KatherineAr commented 2 years ago

That's what I needed. Thank you so much for your help! :)

jebrosen commented 2 years ago

Glad to hear it!