go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.98k stars 276 forks source link

Make series.Order stable #164

Closed mcolosimo-p4 closed 2 years ago

mcolosimo-p4 commented 2 years ago

series.Order does not use the stable sort, this results in DataFrames with multiple columns returning unexpected results. Specifically, the outcome is not what is seen in R or Python (Pandas) and presented me with a hard to find bug in my Go code.

For example:

>>> import pandas as pd
>>> 
>>> data = {'A': ["A", "C", "B", "D", "C", "A", "D", "B"], 
...     'B': [103, 103, 103, 103, 100, 100, 100, 100]}
>>> df = pd.DataFrame.from_dict(data)
>>> df
   A    B
0  A  103
1  C  103
2  B  103
3  D  103
4  C  100
5  A  100
6  D  100
7  B  100
>>> df.sort_values(by=['B'])
   A    B
4  C  100
5  A  100
6  D  100
7  B  100
0  A  103
1  C  103
2  B  103
3  D  103
>>> df.sort_values(by=['A','B'])
   A    B
5  A  100
0  A  103
7  B  100
2  B  103
4  C  100
1  C  103
6  D  100
3  D  103
>>> 

I've included a new test that exposes this issue and a simple fix.

chrmang commented 2 years ago

Hey Marc, thank you for your pull request.