ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
520 stars 111 forks source link

repeatmasker #854

Open zuodabin opened 1 year ago

zuodabin commented 1 year ago

HI ,thinks for your software now ,I want to align 3 genome ,but it not be masked by Repeatmasker, should i need to maske genome? Can I use repeatmodeler and repeatmasker to mask the genome? Ends up using .fa.masked files, right?

glennhickey commented 1 year ago

Yes, the input genomes must be masked with RepeatMasker. This is annoying, and something we hope to provide better automation for next year.

If there's no library for your species, then RepeatModeler would make sense. WindowMasker may also help . Which species are you working on?

zuodabin commented 1 year ago

Thanks for your timely reply. The species I use needs to use RepeatModeler to build the repeat sequence library, which is not difficult, but after Repeatmasker, the ATCG in masked.fa generated can be replaced by N, will it have any impact? What do you recommend?

glennhickey commented 1 year ago

The sequence must be softmasked (ie set to lower case), not replaced by N.

Cactus comes with a tool to softmask fastas using BED regions: cactus_fasta_softmask_intervals.py

You can also use the ".out" file from RepeatMasker to softmask a 2bit sequence file with twoBitMask -type=.out.

You can convert between fasta and 2bit with https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/faToTwoBit https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa

zuodabin commented 1 year ago

Thanks a lost!!