Open leon945945 opened 5 months ago
If I recall correctly, GeneMark-ETP computes internally a "repeat penalty", and I think that only works properly with softmasking. If it's hard masked, the N sequences will probably simply be ignored. @alexlomsadze may correct me on that.
For AUGUSTUS, softmasking opens the opportunity to extend a gene structure from unmasked into masked region if the genome was softmasked. So here, softmasking is usually an advantage compared to hard masking. This is reflected by the number that you observe.
Hi, I used
braker3
to annotated my phased haplotypes, and I annotated this two haplotypes with hard-masked and soft-masked genome, separately.Results: hard-masked hap1: 22396 hard-masked hap2: 23017 soft-masked hap1: 24072 soft-masked hap2: 27374
As we can see, the gene number of two haplotypes were considerable with hard-masked genomes, but hap2 was annotated with more than 3000 genes than hap1 when using soft-masked genome.
Did these results demonstrate that
braker3
has repeat sequences bias, the repeat sequences difference of two haplotypes makebraker3
perform differentially?