AlenkaF / vaex-df-api-implementation

Researching and working on dataframe interchange protocol and Vaex library.
2 stars 0 forks source link

🎉 Implementation of the protocol's first draft into local Vaex library #1

Closed AlenkaF closed 3 years ago

AlenkaF commented 3 years ago

Moving from Jupyter Lab to local library.

AlenkaF commented 3 years ago

Local library is editable and the code for the protocol (together with the tests) is in the correct place: pytest test_file.py works as it should.

The roundtrip test (Pandas -> Vaex and Vaex -> Pandas) in the python command line works (for numeric and chategorical, without Arrow Dictionary). Short example:

>>> import vaex
>>> df = vaex.from_arrays(
... x=np.array([True, True, False]),
... y=np.array([1, 2, 0]),
... z=np.array([9.2, 10.5, 11.8]))
>>> df
  #  x        y     z
  0  True     1   9.2
  1  True     2  10.5
  2  False    0  11.8
>>> from_dataframe(df)
       x  y     z
0   True  1   9.2
1   True  2  10.5
2  False  0  11.8
>>> from_dataframe_to_vaex(df)
  #  x        y     z
  0  True     1   9.2
  1  True     2  10.5
  2  False    0  11.8
>>> df = pd.DataFrame({"A": [1, 2, 5, 1]})
>>> df["B"] = df["A"].astype("category")
>>> df
   A  B
0  1  1
1  2  2
2  5  5 
3  1  1
>>> df
   A  B
0  1  1
1  2  2
2  5  5
3  1  1
>>> col = df.__dataframe__().get_column_by_name('B')
>>> col
<__main__._PandasColumn object at 0x00000213D55C5190>
>>> col.dtype[0]
23
>>> col.describe_categorical
(False, True, {0: 1, 1: 2, 2: 5})
>>> from_dataframe(df)
   A  B
0  1  1
1  2  2
2  5  5
3  1  1
>>> from_dataframe(df).__dataframe__().get_column_by_name('B')
<__main__._PandasColumn object at 0x00000213D55AA820>
>>> from_dataframe(df).__dataframe__().get_column_by_name('B').dtype[0]
23
>>> from_dataframe(df).__dataframe__().get_column_by_name('B').describe_categorical
(False, True, {0: 1, 1: 2, 2: 5})
>>> from_dataframe_to_vaex((df)
... )
  #    A    B
  0    1    0
  1    2    1
  2    5    2
  3    1    0
>>> from_dataframe_to_vaex(df)
  #    A    B
  0    1    0
  1    2    1
  2    5    2
  3    1    0
>>> from_dataframe_to_vaex(df).__dataframe__().get_column_by_name('B')
<vaex.dataframe_protocol._VaexColumn object at 0x00000213DC31B820>
>>> from_dataframe_to_vaex(df).__dataframe__().get_column_by_name('B').dtype[0]
23
>>> from_dataframe_to_vaex(df).__dataframe__().get_column_by_name('B').describe_categorical
(False, True, {0: 1, 1: 2, 2: 5})