-
It seems `row_group_slots` for arrays implementing `DataAPI.refpool` is much slower than it should. Profiling reveals that most of the time is spent on the `groups[i] = j` line. Yet, calling `row_grou…
-
### Problem description
When working with datetime columns and using `dates_as_objects=False`, I expect null values to be `np.datetime` objects (i.e. `np.datetime64("NaT")`).
However, kartothek is…
-
Give a warning if dataframes have different sizes of index.
-
I've drafted a package for pooled elements at the following link. The main purpose of this package is to speed up grouping and joining in DataFrames. If this is used in DataFrames, it will also reduce…
-
Problem: Dask ( https://docs.dask.org/en/latest/ ) is a very good parallel/distributed data system that replaces pandas. The people who worked on it, have made it play well with numpy, pandas, scikit…
-
Hello, thanks for writing this. I've benchmarked its use against the default randomForest implementation in R and have found it to be amazingly fast.
I was hoping to be able to use this library with…
-
I understand the need to minimise prerequisites and certainly the portions of the shell lessons relating to loops, pipes, filters, and scripts are unnecessary overhead. However, given that they will b…
-
While trying to optimize performance of operations on DataFrames, I've found that a recurring pattern underlies a lot of the performance bottlenecks people encounter: functions that take in DataFrames…
-
### Is your feature request related to a problem?
Some DB engines provide a [TO_JSON_STRING](https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#to_json_string) or [TO_JSON](…
-
Hi there,
I want to able to assign labels to the GeneralGraph similar to what you do with the GraphUtils instead of using the default X1...Xn notation.
This is what I tried:
```
cg.G.get_node_…