Standalone version of the background matching script developed for Python 2.7.9
Just supports region length and GC content as features.
I ran a couple of tests and the performance is roughly as follows:
Input: 200.000 DNase regions (Blueprint data), on 23 chromosomes
CPUs: 3
timeout: 2 minutes (per chromosome)
relaxation: at most 2 pct. points
That gives ~160.000 matched regions in roughly 20 minutes time, so ~2000-4000 matches per minute search time per chromosome. Memory consumption is <= 10G in this scenario.
This is still the original randomized search; using an appropriate index may speed up the process substantially. Also, lowering the memory requirements would be possible at the expense of a longer start-up time.
Standalone version of the background matching script developed for Python 2.7.9
Just supports region length and GC content as features.
I ran a couple of tests and the performance is roughly as follows: Input: 200.000 DNase regions (Blueprint data), on 23 chromosomes CPUs: 3 timeout: 2 minutes (per chromosome) relaxation: at most 2 pct. points
That gives ~160.000 matched regions in roughly 20 minutes time, so ~2000-4000 matches per minute search time per chromosome. Memory consumption is <= 10G in this scenario.
This is still the original randomized search; using an appropriate index may speed up the process substantially. Also, lowering the memory requirements would be possible at the expense of a longer start-up time.
For all of the above, YMMV applies.