[dattri.benchmark] Add MAESTRO benchmark

tingwl0122 commented 4 months ago

Description

This PR will implement the train/eval functions and scripts for MusicTransformer(MT) on MAESTRO dataset.

1. Motivation and Context

To implement the corresponding benchmark experiment. Note: #52 discusses importing the MusicTransformer models/training/evaluation function into our repo.

2. Summary of the change

migrate the MT repo MusicTransformer-Pytorch
create dattri/benchmark/maestro folder, which handles the MT training, loss calculation, and MAESTRO dataset creation
create test/dattri/benchamrk/test_maestro.py to test the functions in dattri/benchmark/maestro.py
(Possibly) modify dattri/scripts/retrain.py to handle this new benchmark experiment

Note:

[ ] TODO: complete the script.
[ ] TODO: write a test file to test/dattri/benchmark to only test the basic functionalities.

3. What tests have been added/updated for the change?

[ ] Unit test: Typically, this should be included if you implemented a new function/fixed a bug.

tingwl0122 commented 4 months ago

It looks like it is not straightforward to skip Darglint for some given folders...

tingwl0122 commented 4 months ago

Ignore the ruff rule [PLR0914, PLR0915] for the functions in train.py.

jiaqima commented 4 months ago

@xingjian-zhang please take a rough look at the structures. Under the benchmark folder, we will have multiple datasets, with some datasets sharing some models (e.g., resnet or GPT). We are thinking about having a benchmark/models folder for the model code and a benchmark/<dataset name> folder for the training and eval code for each dataset. Please comment in the review if you have suggestions.

tingwl0122 commented 4 months ago

Thanks for the review, @xingjian-zhang! For the first comment, since it is more like a script to test the entire pipeline, including file downloading, preprocessing, etc. So I only locally tested it and didn't want it to pass through pytest. And according to #51's structure, we will put some working scripts under dattri/scripts/. Also, I will add a simpler test file under test/dattri/benchmark/ to test the basic functionality of dattri/benchmark/maestro/train.py (possibly split some functions to other places)

For the second comment, I think @TheaperDeng has written something similar to this in #51 (dattri/scripts/retrain.py).

xingjian-zhang commented 4 months ago

Thanks! #51 is pretty similar to what I am thinking.

TRAIS-Lab / dattri