0xTCG / biser

A fast tool for detecting and decomposing segmental duplications in genome assemblies
MIT License
43 stars 0 forks source link

BISER completed on soft_mask, but failed on hard_masked genome. #22

Closed Weikai-47 closed 1 year ago

Weikai-47 commented 1 year ago

Hello, During my experience, I met some questions:

The genome masking was performed using TRF and Repeatmasker according to this pipeline (Segmental duplications and their variation in a complete human genome, Science, 2022). Then BISER v1.3 was ran on hard-mask and soft-mask genome, respectively. (biser --temp TMP -o DH.genome.hard -t 40 (--hard) --max-error=10 --max-edit-error=10 --max-chromosome-size=350000000 DH.genome.hard.fasta)

(1) BISER was successfully completed on soft_masked genome. Then I extracted the elementary SDs sequences from hard_mask genome based on the .elem.txt. The sequences looked like as this : image

The elementary SDs were 337 Mb, with 63.7% masked repeats. Why ?

(2) BISER was failed at the step of Putative SD detection on the hard_masked genome. SD elem

Why the BISER failed on hard_masked genome ?

Thank You! Chen

inumanag commented 1 year ago
  1. This is expected--- BISER ignores repeats during the elementary SD generation (bu definition, elementary SDs are "older" than repeats so they are ignored). Repeats are later on superimposed on top of the hard-masked genome, so you should expect to see them in the final elementary SD set. They should be ignored during the analysis.

  2. I don't know what is "hard-masked" genome in this context. Nevertheless, please use soft-masked version: BISER already does hard-masking that it needs automatically.