MrPowers / farsante

Fake Pandas / PySpark DataFrame creator
43 stars 6 forks source link

Improve py h2o performance #15

Closed jeffbrennan closed 8 months ago

jeffbrennan commented 8 months ago

adds parallel invocation of h2o dataset generation from python. fixes tests currently broken on main as mentioned in #14.

Noticed that python >=3.11 wasn't able to run the pyspark tests (previously set at a static version of 3.3.1)

future work should focus on generalizing the rust interface to create any dataset (eg, the mvv dataset used in quinn column to performance testing