asntech / intervene

Intervene: a tool for intersection and visualization of multiple genomic region and gene sets
http://intervene.rtfd.io/
Other
132 stars 28 forks source link

Memory exhausted - too many files #57

Open alistairhockey opened 10 months ago

alistairhockey commented 10 months ago

Hi there,

I am trying to run 'intervene upset'on 73 BED files that have ~40,000 intervals each.

intervene upset -i /data/alistairh/projects/SV_calling/data/peaks/DiscRegions/*{1,2,3}.bed --output SV_calling/data/peaks/DiscRegions/results_RT --save-overlaps

However, Intervene uses up all the available memory (62G) before being killed by the server. Is there a setting or a fix to limit the memory use of Intervene so it doesn't get killed by the server? This hasn't been a problem before when I have used intervene for 15-20 BED files.

asntech commented 10 months ago

That is a lot of BED files! The memory issue is familiar with bedtools/pybedtools using large datasets. Can you use the latest versions of bedtools and pybedtools? Also, try to sort your bed files using bedtools before running the intervene upset.

We're aiming for a parallel processing option in the upcoming version of Intervene!

alistairhockey commented 10 months ago

I haven't had any issues with bedtools multiinter - but maybe the sorting has played a part in that! Also, have you considered having an option for BEDPE files? I would be interested to see if you could modify the script to use 'pairToPair' in place of 'intersect' to get all the BEDPE paired region combinations.