EI-CoreBioinformatics / reat

Robust Eukaryotic Annotation Toolkit
https://reat.readthedocs.io/en/latest/
MIT License
17 stars 3 forks source link

Bug soft / hard masking #19

Closed swarbred closed 2 years ago

swarbred commented 2 years ago

In call-SoftMaskGenome

Bug soft and hard masked files are not masked

see bedtools maskfasta -mc 'N' -fi /ei/cb/development/GENANNO-506/reat-dev_prediction_swarbre/cromwell-executions/ei_prediction/061c5659-a884-488f-83b4-79a7562bc598/call-SoftMaskGenome/inputs/-5733854/Calendula_officinalis_EIV1.2.fasta -bed <(gffread --bed $rep_file) -fo Calendula_officinalis_EIV1.2.hardmasked.fa

this will not work if the input gff has match features (which looks to be the expectation given what is being parsed in call-PreprocessRepeats

gffread will not use all feature types and even using -O only works for gff3 output

so just need to convert match to exon e.g. that is then the same requirement as for augustus

bedtools maskfasta -mc 'N' -fi Calendula_officinalis_EIV1.2.fasta -bed <(awk 'BEGIN{OFS="\t"} $3=="match" {print $1, "repmask", "exon", $4, $5, $6, $7, $8, $9}' all_interspersed_repeats.gff | gffread --bed) -fo Calendula_officinalis_EIV1.2.softmasked.fa