crflynn / pbspark

protobuf pyspark conversion
MIT License
21 stars 5 forks source link

Add logic to detect and handle circular loops, and optionally ignore deprecated fields #36

Open BrendanJM opened 2 years ago

BrendanJM commented 2 years ago

Add options for creating spark struct to ignore circularly defined fields and/or deprecated fields. This PR is a no-op change, meaning existing behavior should not be impacted.

I was not able to get the tests up and running on my local fork (make gen and make test both gave errors), so these will need to have tests added to validate behavior is as expected. I was able to informally validate behavior is as expected in a notebook. You may also want to further configure logging, but I think that's out of scope for this PR.

It may be worth separating the recursive functionality of get_spark_schema from the public interface to avoid the need to surface _seen_descriptors publicly (e.g. keep the current entry point and hide the recursion behind something like _ get_spark_schema)

BrendanJM commented 2 years ago

@crflynn Please let me know what your thoughts are on adding this functionality, or if there's any additional changes you'd like to see.

crflynn commented 2 years ago

We would probably want to change some things here since options was intended to be a dict of the kwargs of protobuf's MessageToDict. We could probably make the args more explicit.

I see you added logic to get_spark_schema but not to the parsers. Is there additional logic required there?

Also as you mention there are no tests. Getting set up should be straightforward if you have asdf and poetry. What issues are you experiencing?