UserWarning: Distributing <class 'pandas.core.frame.DataFrame'> object. This may take some time.
distributed.core - ERROR - Exception while handling op scatter
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.6/site-packages/distributed/core.py", line 500, in handle_comm
result = await result
File "/opt/miniconda3/lib/python3.6/site-packages/distributed/scheduler.py", line 4773, in scatter
nthreads, data, rpc=self.rpc, report=False
File "/opt/miniconda3/lib/python3.6/site-packages/distributed/utils_comm.py", line 149, in scatter_to_workers
for address, v in d.items()
File "/opt/miniconda3/lib/python3.6/site-packages/distributed/utils.py", line 229, in All
result = await tasks.next()
File "/opt/miniconda3/lib/python3.6/site-packages/distributed/core.py", line 861, in send_recv_from_rpc
result = await send_recv(comm=comm, op=key, **kwargs)
File "/opt/miniconda3/lib/python3.6/site-packages/distributed/core.py", line 662, in send_recv
raise Exception(response["text"])
Exception: No module named 'remapper'
Potential Solution
Allow conversion into a regular panda data frame. I have tried using the regular pandas DataFrame constructor, but the wrapper remains. I think this would be clearly the preferable way, as any downstream issues wrt pandas parallel processing will be avoided.
Document how the PYTHONPATH has to be modified (in the notebook itself, prior to importing modin) when using the Research Docker image so that the Dask workers can locate the remapper module.
Reproducing the Problem
from clr import AddReference
from QuantConnect.Python import *
from QuantConnect.Brokerages import BrokerageName
import modin.pandas as pd
import pandas as pdo
qb = QuantBook()
qb.SetBrokerageModel(BrokerageName.Bitfinex, AccountType.Margin)
qb.SetStartDate(2021,1,18)
qb.AddCrypto("BTCUSD", Resolution.Minute)
h = qb.History(qb.Securities.Keys, 2 * 10)
# This shows that the remapper wrapper object is kinda 'persistent'
h_pandas = pdo.DataFrame(h)
h_modin = pd.DataFrame(h_pandas)
System Information
quantconnect/research:latest docker image, extended by installation of modin[dask]==0.6.3.
Checklist
[X] I have completely filled out this template
[X] I have confirmed that this issue exists on the current master branch
[X] I have confirmed that this is not a duplicate issue by searching issues
[X] I have provided detailed steps to reproduce the issue
Expected Behavior
The DataFrame returned by
h = qb.History(qb.Securities.Keys, 10)
should be a regular data frame or a conversion method should be available.Actual Behavior
The DataFrame returned by the history call is some special wrapper object (https://github.com/QuantConnect/Lean/blob/4c085ff853b8a7e63aa4fc9dff7f03c354a75e7f/Common/Python/PandasData.cs#L571), causing various parallel processing libraries to fail during pickling. In the following stacktrace, it can bee seen that the module 'remapper' cannot be found, which is defined here: https://github.com/QuantConnect/Lean/blob/master/Common/Python/PandasData.cs
Potential Solution
remapper
module.Reproducing the Problem
System Information
quantconnect/research:latest
docker image, extended by installation ofmodin[dask]==0.6.3
.Checklist
master
branch