-
Hello,
`duckdb` and arrow seem to write parquet files at roughly the same speed until the data gets to about 10+ GB, at which point duckdb is about an order of magnitude slower.
This is very lar…
-
When displaying pandas data frames on the dashboard, I would like to have the row index (same as the column index that already appears):
-
This fails in Spark:
```
r1 = {
"first_name": "John",
"surname": "Smith",
"dob": "1980-01-01",
}
r2 = {
"first_name": "John",
"surname": "Smith",
"dob": None,
}
…
-
I'd like to efficiently and in parallel compute the median = q_0.5 and IQR = q_0.75 - q_0.25 of each column in a dataframe. Let's compare the 3 most used libraries:
**pandas:**
```
import numpy…
-
Very annoying to have to specify `NA_integer_` or the other variants:
``` r
library(polars)
options(polars.do_not_repeat_call = TRUE)
pl$DataFrame(x = list(1L, 2L, NA_integer_))
#> shape: (3,…
-
Hello! This library is very useful for me, so I thank you for that!
One thing I've noticed recently - which I suspect is cropping up through an updated dependency as it didn't happen prior to the l…
-
Hello,
I was looking at your code, and the results look promising. When trying to run it myself, I noticed that you are referencing a class "BTCCrawl_To_DataFrame_Class", which I cannot find. Pleas…
-
I mentioned this in #11067, but maybe this deserves its own issue: I find it difficult to turn off query planning using the Python API. Using `dask.config.set` only works if `dask.dataframe` hasn't be…
-
**Is your feature request related to a problem? Please describe.**
[Lazyframes](https://docs.pola.rs/api/python/stable/reference/lazyframe/index.html) allow for larger-than-memory dataframes to be ha…
-
### Check for existing issues
- [X] Completed
### Describe the feature
When you display a dataframe via the REPL, the formatting is very bad. I tried this with multiple dataframe libraries (polars,…