Closed Weikai-47 closed 1 year ago
This is expected--- BISER ignores repeats during the elementary SD generation (bu definition, elementary SDs are "older" than repeats so they are ignored). Repeats are later on superimposed on top of the hard-masked genome, so you should expect to see them in the final elementary SD set. They should be ignored during the analysis.
I don't know what is "hard-masked" genome in this context. Nevertheless, please use soft-masked version: BISER already does hard-masking that it needs automatically.
Hello, During my experience, I met some questions:
The genome masking was performed using TRF and Repeatmasker according to this pipeline (Segmental duplications and their variation in a complete human genome, Science, 2022). Then BISER v1.3 was ran on hard-mask and soft-mask genome, respectively. (biser --temp TMP -o DH.genome.hard -t 40 (--hard) --max-error=10 --max-edit-error=10 --max-chromosome-size=350000000 DH.genome.hard.fasta)
(1) BISER was successfully completed on soft_masked genome. Then I extracted the elementary SDs sequences from hard_mask genome based on the .elem.txt. The sequences looked like as this :
The elementary SDs were 337 Mb, with 63.7% masked repeats. Why ?
(2) BISER was failed at the step of Putative SD detection on the hard_masked genome.
Why the BISER failed on hard_masked genome ?
Thank You! Chen