Closed maxim-h closed 1 year ago
Hi!
You are running out of memory, can you try lowering the number of cores?
Cheers,
C
PD: Also can you show your fragments dict? Do all samples have the same name?
Thanks for a quick response)
I've tried less cores. But even with n_cpu=1
the same is happening. Is there any rule of thumb currently for how much RAM is required? I'm on a 64G system with no swap. This is what I can typically get that can run Jupyter. Otherwise I'm having to wrap each function call in a script and submit it to the HPC cluster, which is also what I had to do with export_pseudobulk
.
The fragments dict only has one sample:
fragments_dict = {'s1d1':'/scratch/kholmato/PROJECTS/NeurIPS2021/fragment_files/s1d1/atac_fragments.tsv.gz'}
I've decided to start with just one sample since there's no multisample tutorial yet.
P.S.: I've noticed that Jupyter kinda loses track of functions that use ray
. Meaning it tells me that the kernel is idle even though the function is still running. Or maybe the function returns before the end of the child precess' execution. Is there a way to force jupyter (or the function itself) to wait until the end of execution?
P.P.S.: It looks like ray
is retrying the failed "jobs" several times judging by the line :
(compute_qc_stats_ray pid=3268) 2022-05-10 13:08:22,702 cisTopic INFO Reading s1d1
that shows up multiple times in the log. If that's the case, is there a way to ask it not to retry any failed steps?
Yes, I havent seen this before, weird... Also not sure why ray is showing multiple times the same log line in your case (which version do you have? We use 1.12).
Since it seems it just gets stuck in reading the file already, can you try if you can read the fragments file independent of ray (and check how much memory it takes? how big is it?). You can use the function read_fragments_from_file
. If you manage to read it, you may need to adjust the object memory of ray to that range of values when using ray.
Cheers,
Carmen
Ah, yes, it is just that the fragment file is too big. Just doing the read_fragments_from_file
crashes the session. It's around 10G uncompressed in my case, but probably converting to pyranges brings it to 20G in total, and maybe there's some other junk floating in memory. Btw, use_polars
is really nice. I still run out of memory, but I don't have to wait minutes for it to happen).
I'll try to find a bigger system if I can. Or just skip the QCs for now and initialize the cisTopic object from a precomputed count Matrix. Is that option already available?
Regarding ray
repeating the same thing, I used 1.11
first. Updating to 1.12
didn't change anything.
Hi Max!
Indeed, in the close future our intention is to move to polars/pytables, this should also make these steps more memory efficient hopefully. At the moment we have a dependency on pyarrow 0.17 (from pyscenic/ctxcore) that we are solving to upgrade to pyarrow 7, and then we will start the updates here as well :).
It is possible to start from a precomputed matrix (you can also plug-in qc stats if you have them), you can use:
# Create cisTopic object
from pycisTopic.cistopic_class import *
matrix_path='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/toy_data/toy_data/cistopic_countmatrix.tsv'
path_to_blacklist='/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/pycisTopic/blacklist/hg19-blacklist.v2.bed'
cisTopic_obj = create_cistopic_object_from_matrix_file(matrix_path, path_to_blacklist)
# Adding cell information
cell_data = pd.read_csv('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/toy_data/toy_data/celldata_mel.tsv', sep='\t')
cisTopic_obj.add_cell_data(cell_data)
Preferably you would use the pseudobulk peaks to create this matrix. Normally we use the fragments file and the consensus peaks to create the count matrix, but I am afraid you will have the same issue (since it requires reading the fragments file).
Ok, thanks! The count matrix based on pseudobulk peaks I can create in R. Signac::FeatureMatrix
can compute it without loading the fragments file into memory. Maybe there's even a python alternative, but I'm not aware. I'll let you know how it goes.
Cheers, Max
Check the polars brach :)
Speed improvements in QC calculation are in the polars branch: https://pycistopic.readthedocs.io/en/polars/notebooks/human_cerebellum.html#QC
Describe the bug
When following the
Single_sample_workflow.ipynb
I get an error while executingcompute_qc_stats()
. I'm using one of the samples from NeurIPS2021 BMMC data.To Reproduce
Error output
Command output
``` 2022-05-10 09:28:21,105 cisTopic INFO n_cpu is larger than the number of samples. Setting n_cpu to the number of samples 2022-05-10 09:28:27,947 INFO services.py:1412 -- View the Ray dashboard at [http://127.0.0.1:8265](http://127.0.0.1:8265 (raylet) loop.run_until_complete(agent.run()) (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete (raylet) return future.result() (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/dashboard/agent.py", line 178, in run (raylet) modules = self._load_modules() (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/dashboard/agent.py", line 120, in _load_modules (raylet) c = cls(self) (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__ (raylet) self._metrics_agent = MetricsAgent( (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/_private/metrics_agent.py", line 75, in __init__ (raylet) prometheus_exporter.new_stats_exporter( (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter (raylet) exporter = PrometheusStatsExporter( (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__ (raylet) self.serve_http() (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http (raylet) start_http_server( (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server (raylet) TmpServer.address_family, addr = _get_best_family(addr, port) (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family (raylet) infos = socket.getaddrinfo(address, port) (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/socket.py", line 918, in getaddrinfo (raylet) for res in _socket.getaddrinfo(host, port, family, type, proto, flags): (raylet) socket.gaierror: [Errno -2] Name or service not known (raylet) (raylet) During handling of the above exception, another exception occurred: (raylet) (raylet) Traceback (most recent call last): (raylet) File "/home/kholmato/.micromamba/envs/scenicplus/lib/python3.8/site-packages/ray/dashboard/agent.py", line 407, inExpected behavior Not an error)
Version (please complete the following information):
pip freeze
adjustText==0.7.3 aiohttp==3.8.1 aiohttp-cors==0.7.0 aioredis==1.3.1 aiosignal==1.2.0 anndata==0.7.8 annoy==1.17.0 appdirs==1.4.4 arboreto==0.1.6 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 asttokens==2.0.5 async-timeout==4.0.2 attr==0.3.1 attrs==21.4.0 backcall==0.2.0 bbknn==1.5.1 beautifulsoup4==4.11.1 bioservices==1.8.4 bleach==5.0.0 blessed==1.19.1 bokeh==2.4.2 boltons==21.0.0 cachetools==5.0.0 cattrs==1.10.0 certifi==2021.10.8 cffi==1.15.0 charset-normalizer==2.0.12 click==8.1.2 cloudpickle==2.0.0 colorama==0.4.4 colorful==0.5.4 colorlog==6.6.0 cryptography==36.0.2 ctxcore==0.1.1 cycler==0.11.0 Cython==0.29.28 cytoolz==0.11.2 dask==2022.4.0 dataclasses-json==0.5.7 debugpy==1.6.0 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.13 deprecation==2.1.0 dill==0.3.4 distributed==2022.4.0 docutils==0.18.1 easydev==0.12.0 entrypoints==0.4 executing==0.8.3 fastjsonschema==2.15.3 fbpca==1.0 filelock==3.6.0 fonttools==4.32.0 frozendict==2.3.1 frozenlist==1.3.0 fsspec==2022.3.0 future==0.18.2 gensim==4.1.2 geosketch==1.2 germalemma==0.1.3 gevent==21.12.0 globre==0.1.5 google-api-core==2.7.2 google-auth==2.6.4 googleapis-common-protos==1.56.0 gpustat==1.0.0b1 greenlet==1.1.2 grequests==0.6.0 grpcio==1.43.0 gseapy==0.10.8 h5py==3.6.0 harmonypy==0.0.5 HeapDict==1.0.1 hiredis==2.0.0 idna==3.3 igraph==0.9.10 imageio==2.16.2 importlib-resources==5.7.0 interlap==0.2.7 intervaltree==3.1.0 ipykernel==6.13.0 ipympl==0.9.0 ipython==8.2.0 ipython-genutils==0.2.0 ipywidgets==7.7.0 jedi==0.18.1 Jinja2==3.1.1 joblib==1.1.0 jsonschema==4.4.0 jupyter-client==7.2.2 jupyter-core==4.9.2 jupyterlab-pygments==0.2.2 jupyterlab-widgets==1.1.0 kiwisolver==1.4.2 lda==2.0.0 leidenalg==0.8.9 llvmlite==0.38.0 locket==0.2.1 loompy==3.0.7 loomxpy==0.4.1 lxml==4.8.0 MACS2 @ file:///opt/conda/conda-bld/macs2_1645526782437/work MarkupSafe==2.1.1 marshmallow==3.15.0 marshmallow-enum==1.5.1 matplotlib==3.5.1 matplotlib-inline==0.1.3 mistune==0.8.4 msgpack==1.0.3 mudata==0.1.1 multidict==6.0.2 multiprocessing-on-dill==3.5.0a4 muon==0.1.2 mypy-extensions==0.4.3 natsort==8.1.0 nbclient==0.6.0 nbconvert==6.5.0 nbformat==5.3.0 ncls==0.0.64 nest-asyncio==1.5.5 networkx==2.8 nltk==3.7 notebook==6.4.10 numba==0.55.1 numexpr==2.8.1 numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1649572648093/work numpy-groupies==0.9.14 nvidia-ml-py3==7.352.0 opencensus==0.8.0 opencensus-context==0.1.2 packaging==21.3 pandas==1.4.2 pandocfilters==1.5.0 parso==0.8.3 partd==1.2.0 patsy==0.5.2 PatternLite==3.6 pbr==3.1.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.1.0 prometheus-client==0.14.1 prompt-toolkit==3.0.29 protobuf==3.20.0 psutil==5.9.0 ptyprocess==0.7.0 pure-eval==0.2.2 py-spy==0.3.11 pyarrow==0.16.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybedtools==0.9.0 pyBigWig==0.3.18 pybiomart==0.2.0 -e git+ssh://git@github.com/aertslab/pycistarget.git@d2571bf92579addfc8a21465bfbcd0de86aedc61#egg=pycistarget -e git+ssh://git@github.com/aertslab/pycisTopic.git@ddb3caf583312f1169776a21650bc47ff4daebe6#egg=pycisTopic pycparser==2.21 pyfasta==0.5.2 Pygments==2.11.2 pynndescent==0.5.6 pyOpenSSL==22.0.0 pyparsing==3.0.8 pyphen==0.12.0 pyranges==0.0.115 pyrle==0.0.34 pyrsistent==0.18.1 pysam==0.19.0 pyscenic==0.11.2 python-dateutil==2.8.2 python-igraph==0.9.10 python-Levenshtein==0.12.2 pytz==2022.1 PyWavelets==1.3.0 PyYAML==6.0 pyzmq==22.3.0 ray==1.11.0 redis==4.2.2 regex==2022.3.15 requests==2.27.1 requests-cache==0.9.3 rsa==4.8 scanorama==1.7.2 scanpy==1.9.1 -e git+ssh://git@github.com/aertslab/scenicplus.git@6faa2d62416e3c78aeb45eead5c70cdafecb762f#egg=scenicplus scikit-image==0.19.2 scikit-learn==0.24.2 scipy==1.8.0 scrublet==0.2.3 seaborn==0.11.2 Send2Trash==1.8.0 session-info==1.0.0 six==1.16.0 sklearn==0.0 smart-open==5.2.1 sorted-nearest==0.0.33 sortedcontainers==2.4.0 soupsieve==2.3.2.post1 stack-data==0.2.0 statistics==1.0.3.5 statsmodels==0.13.2 stdlib-list==0.8.0 suds-community==1.1.0 tables==3.7.0 tabulate==0.8.9 tblib==1.7.0 terminado==0.13.3 texttable==1.6.4 threadpoolctl==3.1.0 tifffile==2022.4.8 tinycss2==1.1.1 tmtoolkit==0.9.0 toolz==0.11.2 tornado==6.1 tqdm==4.64.0 traitlets==5.1.1 typing==3.7.4.3 typing-inspect==0.7.1 typing_extensions==4.1.1 umap-learn==0.5.3 url-normalize==1.4.3 urllib3==1.26.9 wcwidth==0.2.5 webencodings==0.5.1 widgetsnbextension==3.6.0 wrapt==1.14.0 xlrd==1.2.0 xmltodict==0.12.0 yarl==1.7.2 zict==2.1.0 zipp==3.8.0 zope.event==4.5.0 zope.interface==5.4.0
Additional context I'm running in a JupyterHub instance hosted at our institute. The whole working directory is on an NFS.