crflynn / pbspark

protobuf pyspark conversion
MIT License
21 stars 5 forks source link

df_to/from_protobuf functions #25

Closed crflynn closed 2 years ago

crflynn commented 2 years ago

Add top level functions to_protobuf and from_protobuf which simplify column conversion without the user having to create a MessageConverter.

Add top level functions df_to_protobuf and df_from_protobuf which can be passed a dataframe without a MessageConverter. These functions also have expanded: bool args which handle the expansion and contraction of data from/to the struct which is the output/input of the encoding/decoding udfs. The functions call into new MessageConverter.df_to/from_protobuf methods.

Also, since spark already uses flatten nomenclature, which is different from what we are doing here, we change our documentation to use the work expand or contract to describe that we are unpacking or packing a struct, respectively.