PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
250 stars 73 forks source link

The efficiency of RepeatMasker #534

Closed biosciences closed 2 years ago

biosciences commented 2 years ago

The RepeatMasker runs for a long time and its job was stopped for maximum execution time on the queue system on HPC cluster. Could we find out a solution to improve the efficiency of RepeatMasker and reduce the running time?

d-cameron commented 2 years ago

Could we find out a solution to improve the efficiency of RepeatMasker and reduce the running time?

The simplest solution is to filter low confidence variants before performing the RepeatMasker annotation. The default GRIDSS output includes many many low confidence variants that would be filtered downstream of the RM annotation is most pipelines. Moving the filtering upstream of the RM annotation will considerably reduce the RM runtime since the input size should be over an order of magnitude smaller.