Feature calculation: Percentage low complexity regions detected in the sequence

Programm call and Python/R script Input: fasta file with the sequences Output: csv file with the sequence identifier and the percentage of the sequence, that is low complexity. The program dustmasker should be used for this (score threshold for sub windows set to 15). It writes the low complexity regions in lower case in the fasta file. The number of lower case letters has to be divided by the length of the sequence. This should be done with a R or python function. Dustmasker can be installed using conda

"comment",low complex
"mmu-mir-380 MI0000797 Mus musculus miR-380 stem-loop", 0.2
"mmu-mir-381 MI0000798 Mus musculus miR-381 stem-loop", 0.4

Source Paper: HuntMi: an efficient and taxon-specific approach in pre-miRNA identification

OstfriesenBI / PredmiRNA

Feature calculation: Percentage low complexity regions detected in the sequence #13