aertslab / arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

GRNboost worker memory usage #13

Closed olgabaranov closed 5 years ago

olgabaranov commented 5 years ago

Sorry for posting in pySCENIC, was expecting to find the arboreto repo in the Aerts lab Git. Here is the problem:

When running

from arboretum import algo
import pandas as pd

geneData = pd.read_csv("my-count-data.csv",index_col=[0],header=0)
network = algo.grnboost2(expression_data=geneData.T)```

among multiple warnings I get the following message:

Worker is at 89% memory usage. Pausing worker. Process memory: 5.04 GB -- Worker memory limit: 5.62 GB



as far as I understand it, this message comes from dask and can be alleviated by changing dask limit settings. But I am not sure how to do that... Shall I import dask prior to GRNboost and change the settings first? Are there any hidden options how to access dask options via GRNboost itself?
Thanks in advance!

P.S. I am running python 3.7, arboretum 0.1.3 on Ubuntu 16.04.
tmoerman commented 5 years ago

See docs: https://arboreto.readthedocs.io/en/latest/userguide.html#running-with-a-custom-dask-client

tmoerman commented 5 years ago

closing for now