Open AdrianSosic opened 1 week ago
I don't think this should be labelled a bug
, the order is inherent to the used method + order of parameters apssed
Unless there is an inconsistent handling of the latter from our side, this might not even be fixable from our end
In the provided example it seems like polars treats the parameters starting from the last while pandas starts from the first. If this is true we might be able to simply exploit this and reverse the order of parameters passed to the polars method
Agree, this should be investigated in a bit more detail. But can you elaborate a little bit how and why this caused issues in testing?
I had a deeper look and noticed that the problem is not because of the Cartesian
part (that works just like the pandas counterpart) but due to the streaming=True
flag in the collect
call, which seems to destroy the order. Now this could be either because:
What do we make out of this?
could there be a reorder step in which the fitlered polars resulting df is ordered according to the parameter-parameter-values order (ie the one Cartesian implies) ? Just .sort_values
?
other than that I dont see a fix
The order of the Cartesian product elements is different when created via
polars
orpandas
. That is, depending onBAYBE_DEACTIVATE_POLARS
, the following code gives two different outputs:Pandas:
Polars:
Not a "bug" in the strict sense, but still annoying and can cause troubles in testing (this is how I found it) or when people rely on the actual ordering and then toggle the flag.