findBackground Python script

Standalone version of the background matching script developed for Python 2.7.9

Just supports region length and GC content as features.

I ran a couple of tests and the performance is roughly as follows: Input: 200.000 DNase regions (Blueprint data), on 23 chromosomes CPUs: 3 timeout: 2 minutes (per chromosome) relaxation: at most 2 pct. points

That gives ~160.000 matched regions in roughly 20 minutes time, so ~2000-4000 matches per minute search time per chromosome. Memory consumption is <= 10G in this scenario.

This is still the original randomized search; using an appropriate index may speed up the process substantially. Also, lowering the memory requirements would be possible at the expense of a longer start-up time.

For all of the above, YMMV applies.

SchulzLab / TEPIC

findBackground Python script #19