NowanIlfideme / pydantic-cereal

Advanced serialization for Pydantic v2
MIT License
5 stars 0 forks source link

Add examples and tests for PySpark Dataframes #35

Open NowanIlfideme opened 7 months ago

NowanIlfideme commented 7 months ago

We currently have examples for Pandas but no examples for PySpark, even though we install PySpark as an optional dependency.

We should enable tests for PySpark, at least in single-node cluster mode (we don't have access to clusters on GitHub Actions without some custom runners...).

Checklist

Possible Issues

⚠️ One pretty big problem... All Spark workers can technically be run anywhere, which means they won't necessarily have access to the same physical storage! If you want to save something on a "local" file system, this will only be accessible by the Driver ("main worker") unless other volumes are somehow mounted...

How will we test this? I'm not entirely sure. For now, writing code and testing a single-node cluster implementation will be enough. Later on, we can implement checks (before writing) that all workers can reach the target path...