coiled / imbalanced-join

BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Testing bigjoin withsynthetic data #2

Closed ncclementi closed 1 year ago

ncclementi commented 1 year ago

This PR includes a notebook that creates df1 (dataset) using coiled following @CerebralMastication code.

We have them now on S3, it is not public data yet as we might want to keep improving this but I'd be happy to put it somewhere public, if we think this is it.

It also includes a notebook, where I run the set_index (step_1) and bigjoin (step_2) operations that were in the original workflow, and I'm including the performance reports too. (I'll write up a follow-up issue to link behavior)