Adding the issues I want to work on and some initial doubts:
[ ] issue
Doubt/line of thought
[ ] Add a HugginFace dataloader to apply IReRa to any multi-label classification task
Add dataloader such that loader.py logic is changed to accomodate HuggingFace dataloader
[ ] Add a cli argument parsing class
Make the argparse stuff into a class.
[ ] Create a config for the Optimizer class
use dataclasses for defining config.
[ ] Make the Optimizer class agnostic of the implementation details of the program that gets optimized, so it can define a high-level strategy which gets compiled across many programs
generalise methods and abstract away methods so that strategy logic can be implemented at a high level.
[ ] Track amount of student and teacher calls during optimization so system-cost can be compared.
simply track calls to llm
[ ] Make src/programs/retriever.py more efficient
work on logic to optimise retrieving of dataset
[ ] Track intermediate pipeline steps and log these traces for debugging
track each step in IReRa and log it.
[ ] Use logging instead of print-statements, and dump logs to experiment file
Define a logger in loader and experiments to log the states
[ ] Control seeds in LM calls and data loaders
Add seed to optimiser.
I hope that i am thinking in the right direction, I am intrigued by the IReRa style and want to help the project get better.
Adding the issues I want to work on and some initial doubts:
Doubt/line of thought
src/programs/retriever.py
more efficient