-
There are currently 3 proxify tests that are failing in CI:
```python
FAILED dask_cuda/tests/test_proxify_host_file.py::test_dataframes_share_dev_mem - AttributeError: 'types.SimpleNamespace' obje…
-
After receiving a research request, use this template to plan and track your work. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).
## Epic Information - HQT…
-
Follow-up to recent issues like https://github.com/dask/dask/issues/8937
Recent PRs have simplified the `read_parquet` API a bit, but the code is still vast. After some careful consideration, I'd l…
-
I am using the Dask client and have to jump between pandas and modin because of the functionality limitations for the data cleansing principles that I had to implement. There is a point in my code whe…
-
In my discussion with Jonathan and others and at the SciPy sprints, we agreed that it would be really nice to expose some minimal tools for manipulating and view the internal pandas blocks system. For…
-
### Modin version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest released version of Modin.
- [X] I have confi…
-
I have two files which I want to preprocess before ANN training. The size of each file is about 3GB, so I decided to use Dask. The shape of the input file is (500000, 410), the output file - (500000, …
-
Prior to `numpy` 1.24 creating an array from ragged nested sequences produced a `VisibleDeprecationWarning`. With 1.24 this is now a `ValueError`. This is OK currently as `numba` doesn't yet support `…
-
I'd like to be able to convert data representing time since UNIX epoch to explicit timestamps format with `to_timestamp`, like I can in Spark SQL and PosgreSQL.
```python
from pyspark.sql import S…
-
cumsum on a dask dataframe with multiple partitions and columns with identical names raise an exception
To reproduce:
```python
import dask
import dask.dataframe as dd
import pandas as pd
…