MrPowers / quinn

pyspark methods to enhance developer productivity 📣 👯 🎉
https://mrpowers.github.io/quinn/
Apache License 2.0
614 stars 96 forks source link

Possibly add a function that's similar to pandas json_normalized #95

Open MrPowers opened 1 year ago

MrPowers commented 1 year ago

Suggestion from this Reddit thread.

huynguyent commented 6 months ago

Kind of cheating but a naive solution is to use pandas json_normalized to parse the json and then convert the resulting pandas df into Spark. The logic seems a bit too simple to justify a dedicated helper function though

MrPowers commented 6 months ago

@huynguyent - would be nice to create an implementation that's really performant and doesn't depend on pandas!

SemyonSinchenko commented 6 months ago

It is possible only if you know the final schema. Otherwise you need to infer the schema first somehow. And even with known schema the simplest solution is still to use UDFs. My first question, do we know the schema in such a case? If not, I would suggest to start from the function like infer_json_schema(col).