-
I know that one particular column will always be a list of floats, so I do:
```
import pyarrow as pa
schema = {"col1": pa.list_(int64())}
df.to_parquet(schema=schema)
```
however, it seems …
-
I came across a use case where attempting to fit a `DaskXGBClassifier` on a Dask Array whose partitions are `scipy.sparse.csr_matrix`s (as is returned by Dask-ML's `HashingVectorizer`) results in a `A…
-
Hi,
I am reading through the chunking options in
- https://docs.dask.org/en/latest/array-chunks.html
- https://docs.dask.org/en/latest/array-api.html?highlight=from_array#other-functions
I wan…
-
Hi!
I've been using swifter for a while as I'm working on an ETL process where I need to handle huge dataframes.
I was used to seeing the progress bar when I used swifter.apply(), but it hasn't …
-
### Polars version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the [latest version](https://pypi.org/project/polars/) of …
-
The [CarbonPlan data catalog repo](https://github.com/carbonplan/data) provides [some examples of tests](https://github.com/carbonplan/data/blob/main/carbonplan_data/tests/__init__.py) that could appl…
-
Hi,
I'm getting the error given below and using the WSL2 Ubuntu 20.04 instance on Windows 11 Preview.
RuntimeError: CUDA error encountered at: /workspace/.conda-bld/work/cpp/src/bitmask/null_mas…
-
I've been profiling distributed workflows in an effort to understand where there are potential performance improvements to be made (this is ongoing with @gjoseph92 amongst others). I'm particularly in…
-
Hi all, I wanted to ask for some help to understand how to work with dataframes/CSVs larger than RAM/memory limit. What I want to do is being able to read a large CSV with a set `memory_limit` in th…
-
A customer is experiecing the scheduler dying after running tasks successfully for a while (possibly a deadlock)
Example cluster that died https://cloud.coiled.io/julianfb51/clusters/36106/2/detail…