Repository Enhanced + Added BenchmarkSubcontext

Innixma commented 1 year ago

Re-creating PR from original repo: https://github.com/Innixma/autogluon-zeroshot/pull/27

Added BenchmarkSubcontext, an elegant way to save and load an EvaluationRepository, and generate it based on a BenchmarkContext. This substantially speeds up loading times (70s -> 8s) for a Repository, and enhances reproducibility.
- It is now recommended to always work with BenchmarkSubcontext rather than BenchmarkContext, as it will be much simpler and more powerful.
Added various utility methods to EvaluationRepository, allowing to easily run ZS simulations with a single method call.
Fixed some minor bugs

TODOs (can be part of follow-up PRs)

[x] Add additional unit tests for Repository
[x] Add unit tests for BenchmarkSubcontext
[ ] Replace all usages of get_context with get_subcontext
[ ] Add documentation to BenchmarkSubcontext
[x] Add documentation to new Repository methods
[ ] Delete more old scripts, unify script code to a couple calls to EvaluationRepository to simplify codebase.
[ ] Add lazy_format support to BenchmarkSubcontext

Innixma commented 1 year ago

Regarding the prior comment from @geoalgo:

Regarding subcontext, I am not sure about the proposed approach. This is because our current approach already cache the repository (it currently takes 3s to load the cached repository which is less than 8s) but also because saving cache for all subsets may be quite expensive (in particular if we dont use lazy predictions, even with lazy predictions just caching the repo takes 0.6GB) and also it is difficult to ensure consistency with naming (if we have a different subset, its difficult to know which cache to load). Given this, I am not sure that this approach is better than just caching just the repository and filtering by datasets after.

Could you let me know your thoughts about this one? In particular, what is the biggest benefit of adding the new caching logic compared to the one already there?

I wrote the subcontext logic prior to knowing about your implementation of caching. I think that they can likely be unified for the most part, however the Subcontext logic overall will stay the same regardless of the adopted caching strategy, as they are mostly identical (just saving and loading a pkl file). I think we can merge as is, then later change the inner Subcontext cache/load logic to your implementation without any outer API/logic changes.

I think in general your caching function is more generic and should be re-used, but since we are only caching a single object here, the resulting logic is identical with my implementation.

geoalgo commented 1 year ago

Btw: I hope my comments are constructive, happy to discuss it in person and looking forward to merge the change and do next experiments.

autogluon / tabrepo

Repository Enhanced + Added BenchmarkSubcontext #1

TODOs (can be part of follow-up PRs)