Closed SemyonSinchenko closed 5 months ago
This is cool. I think we should add these APIs as "experimental". From what I've seen, these plans change arbitrarily over time. This code will likely break as time goes on. I don't think that's an issue if we have the experimental annotation in the docs.
I'm not sure if estimate_size_of_df
should return -1
or None
if the result is unknown. That's a TBD.
Looks like we need a humanize_bytes
function here too: https://github.com/MrPowers/mack#humanize-bytes
Cool work!!!
@MrPowers Kindly reminder
@MrPowers Should we close it without merging?
Closed as very unstable API
Two new functions:
Try to estimated the size of DataFrame
On branch feature/plan-utils Changes to be committed: new file: quinn/plan_utils.py
The function, that returns the plan works like this:![image](https://github.com/MrPowers/quinn/assets/29755009/592d2933-6a94-46b7-b56c-db9a283637b2)
The difference with
df.explain
is that our function return string that may be parsed. It is a small function, but it may be used, for example, for generation of data lineage graph (when we are trying to get dependencies on the level of each column).The function, that estimate size in bytes works like this:![image](https://github.com/MrPowers/quinn/assets/29755009/18cec8ff-1434-433c-9665-0f5411188ab1)
This functional is really tricky, I do not know another way to estimate the size. It is important, for example, when we need to estimate the amount of resulting partitions. Or we may use to understand where we can apply broadcast hints, etc.
Because it is absolutely new API, any feedback will be cool!