autogluon / tabrepo

Apache License 2.0
27 stars 7 forks source link

Repository Enhanced + Added BenchmarkSubcontext #1

Closed Innixma closed 1 year ago

Innixma commented 1 year ago

Re-creating PR from original repo: https://github.com/Innixma/autogluon-zeroshot/pull/27

TODOs (can be part of follow-up PRs)

Innixma commented 1 year ago

Regarding the prior comment from @geoalgo:

Regarding subcontext, I am not sure about the proposed approach. This is because our current approach already cache the repository (it currently takes 3s to load the cached repository which is less than 8s) but also because saving cache for all subsets may be quite expensive (in particular if we dont use lazy predictions, even with lazy predictions just caching the repo takes 0.6GB) and also it is difficult to ensure consistency with naming (if we have a different subset, its difficult to know which cache to load). Given this, I am not sure that this approach is better than just caching just the repository and filtering by datasets after.

Could you let me know your thoughts about this one? In particular, what is the biggest benefit of adding the new caching logic compared to the one already there?

I wrote the subcontext logic prior to knowing about your implementation of caching. I think that they can likely be unified for the most part, however the Subcontext logic overall will stay the same regardless of the adopted caching strategy, as they are mostly identical (just saving and loading a pkl file). I think we can merge as is, then later change the inner Subcontext cache/load logic to your implementation without any outer API/logic changes.

I think in general your caching function is more generic and should be re-used, but since we are only caching a single object here, the resulting logic is identical with my implementation.

geoalgo commented 1 year ago

Btw: I hope my comments are constructive, happy to discuss it in person and looking forward to merge the change and do next experiments.