aertslab / arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

msgpack.exceptions.ExtraData: unpack(b) received extra data. #36

Open GeneVector5 opened 1 year ago

GeneVector5 commented 1 year ago

I am trying to run I am trying to run GRNboost from arboreto to infer co-expression modules on a juypternotebook. I have a apple mM@ macbook pro.

Specifically, I am trying to follow along the steps show in pySCENIC - Full pipeline.ipynb

I was given this "warning error" (the execution is still running). But I am struggling to understand what is causing the issue and why it is happening.

from arboreto.utils import load_tf_names
from arboreto.algo import grnboost2
...
adjancencies = grnboost2(expression_data=expression_matrix_df, tf_names=tf_names, verbose=True)
display(adjancencies.head())

This is the output (the execution is taking some time)

preparing dask client

Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.

parsing input
creating dask graph
4 partitions
computing dask graph

/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/client.py:3125: UserWarning: Sending large graph of size 466.82 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
2023-07-18 19:25:05,533 - distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/protocol/core.py", line 158, in loads
    return msgpack.loads(
           ^^^^^^^^^^^^^^
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/msgpack/fallback.py", line 136, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.
2023-07-18 19:25:05,536 - distributed.core - ERROR - Exception while handling op register-client
Traceback (most recent call last):
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/core.py", line 924, in _handle_comm
    result = await result
             ^^^^^^^^^^^^
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/scheduler.py", line 5449, in add_client
    await self.handle_stream(comm=comm, extra={"client": client})
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/core.py", line 977, in handle_stream
    msgs = await comm.read()
           ^^^^^^^^^^^^^^^^^
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/comm/tcp.py", line 254, in read
    msg = await from_frames(
          ^^^^^^^^^^^^^^^^^^
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/comm/utils.py", line 100, in from_frames
    res = _from_frames()
          ^^^^^^^^^^^^^^
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/comm/utils.py", line 83, in _from_frames
    return protocol.loads(
           ^^^^^^^^^^^^^^^
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/distributed/protocol/core.py", line 158, in loads
    return msgpack.loads(
           ^^^^^^^^^^^^^^
  File "/Users/zach/anaconda3/lib/python3.11/site-packages/msgpack/fallback.py", line 136, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.