TRAIS-Lab / dattri

`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
https://trais-lab.github.io/dattri/
MIT License
28 stars 8 forks source link

[dattri.benchmark] Add MAESTRO benchmark #53

Closed tingwl0122 closed 4 months ago

tingwl0122 commented 4 months ago

Description

This PR will implement the train/eval functions and scripts for MusicTransformer(MT) on MAESTRO dataset.

1. Motivation and Context

To implement the corresponding benchmark experiment. Note: #52 discusses importing the MusicTransformer models/training/evaluation function into our repo.

2. Summary of the change

Note:

3. What tests have been added/updated for the change?

tingwl0122 commented 4 months ago

It looks like it is not straightforward to skip Darglint for some given folders...

tingwl0122 commented 4 months ago

Ignore the ruff rule [PLR0914, PLR0915] for the functions in train.py.

jiaqima commented 4 months ago

@xingjian-zhang please take a rough look at the structures. Under the benchmark folder, we will have multiple datasets, with some datasets sharing some models (e.g., resnet or GPT). We are thinking about having a benchmark/models folder for the model code and a benchmark/<dataset name> folder for the training and eval code for each dataset. Please comment in the review if you have suggestions.

tingwl0122 commented 4 months ago

Thanks for the review, @xingjian-zhang! For the first comment, since it is more like a script to test the entire pipeline, including file downloading, preprocessing, etc. So I only locally tested it and didn't want it to pass through pytest. And according to #51's structure, we will put some working scripts under dattri/scripts/. Also, I will add a simpler test file under test/dattri/benchmark/ to test the basic functionality of dattri/benchmark/maestro/train.py (possibly split some functions to other places)

For the second comment, I think @TheaperDeng has written something similar to this in #51 (dattri/scripts/retrain.py).

xingjian-zhang commented 4 months ago

Thanks! #51 is pretty similar to what I am thinking.