Open elephaint opened 3 weeks ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hey @elephaint , thanks for trying out Narwhals! We can prioritise pivot as that would be helpful to you
We'd previously run into issues with Polars pre1.0 inconsistencies, but from this PR it looks you only need the 1.0+ behaviour, right?
Is there anything else we could do in Narwhals to help you out here?
Hey @elephaint , thanks for trying out Narwhals! We can prioritise pivot as that would be helpful to you
We'd previously run into issues with Polars pre1.0 inconsistencies, but from this PR it looks you only need the 1.0+ behaviour, right?
Is there anything else we could do in Narwhals to help you out here?
Thanks! For this PR / repo I think indeed at the moment the only thing I miss is pivot - which I kind of get as it's an expensive op (as most fast frameworks are column based).
Another minor thing that would be great is ability to ascertain datatypes at the op level without casting afterwards. For example, doing .to_dummies()
with Polars will result in an Uint8 but with Pandas in a Float64. I know it can be done by subsequently casting over all the generated columns, but not sure this is the most efficient way when 1000+ columns are generated at once through the .to_dummies()
command. But then again, this is minor, in this PR I settled on not using dummies.
A thing that I found a bit unintuitive is that Narwhals seems to keep track of the order of a Pandas Integer Index even after narwhallifying it - I didn't expect that. Again minor thing, because I just need to remember to do .maybe_reset_index()
after every op that potentially changes the order.
But awesome library, keep up the good work! For this PR I wanted to play with it for Polars support, maybe later I'll try other frameworks - should almost work out of the box now (save for the pivot op).
Edit: maybe one other thing - I see these warnings:
but it doesn't show which codeline it affects, so difficult to debug? I know where I'm adding columns and stuff, but that's in a lot of places.
Thanks so much for your feedback! π
pivot
: we're revisiting the PR and will try to make a release shortlyto_dummies
: we've addressed that as part of this PR so that the output dtype is consistently Int8, we'd overlooked the dtype difference the first time round. Unsigned integers can be a bit unexpected when working with pandas, so Int8 seemed like a good compromise. Would this work for you?Thanks so much for your feedback! π
pivot
: we're revisiting the PR and will try to make a release shortly
Wow, awesome. I'll have a go at it once released.
to_dummies
: we've addressed that as part of this PR so that the output dtype is consistently Int8, we'd overlooked the dtype difference the first time round. Unsigned integers can be a bit unexpected when working with pandas, so Int8 seemed like a good compromise. Would this work for you?
Yes, perfect! I think in general what I'd expect (in an ideal world π - I know this is insanely hard to get perfect) from Narwhals is output equivalence up to DataFrame type (i.e. everything is the same, except the 'container' of the data - e.g. dtypes, row order, column order, column names, precision, nulls, nans, infs, categories are all similar).
- fragmentation warning: technically the warning is coming from pandas, but it's really useful that you've reported this to us - I'll take a look to see if there's anything we're doing that's causing it (and whether it's an issue)
π
thanks! it's released, and the fragmentation warning is hopefully addressed too
thanks! it's released, and the fragmentation warning is hopefully addressed too
Awesome, thanks, I'll have a look at it
thanks! it's released, and the fragmentation warning is hopefully addressed too
Great work, I don't see the fragmentation messages anymore, and pivot works as expected. Thanks! π
Adds support for Polars via Narwhals.
The extension to all frameworks supported under Narwhals should hereafter be (nearly) trivial.
Breaking
PR removes support for
NIXTLA_ID_AS_COL
, i.e., hierarchicalforecast from now on will return everything as a flat table, without an index in case of a Pandas DataFrame.Open items
Fixes / better solutions required
Tests
Performance evaluation
Reconcilers seem to run somewhat faster after this PR. Reconciler0 = MinTShrink, Reconciler1 = BottomUp
This PR:
Current main:
Test script for PR in
tests\test_benchmark.py
, run withpytest tests\test_benchmark.py::test_reconciler -v -s --benchmark-min-rounds=20 --disable-warnings