eth-easl / orion

An interference-aware scheduler for fine-grained GPU sharing
MIT License
79 stars 12 forks source link

set up evaluation wrappers to run inference #10

Closed XianzheMa closed 1 year ago

XianzheMa commented 1 year ago

This PR sets up a way to run inference on each model.

For each model in [vision, bert, transformer, gnmt], I extracted common logic between training and inference into an independent function called setup, and created another wrapper called eval_wrapper for each model for inference work.

Then both eval_wrapper and train_wrapper use the setup function to set up things.

The core logic on how to do inference is encapsulated in utils.measure, which abstracts the forward pass into an argument func, so I don't have to write the measurement logic for each model.

This PR also includes a series of changes to output the experiment data in a structured manner, e.g. in a json file, to ease collecting data into the google sheet later. This is exclusively done by an object, DataManager, which is a member in many SyncInfo classes to write the data into a file with some thread/process-safe control.

The data after each experiment looks like the following

cat eval-gnmt-128-train-gnmt-128-temporal.log.json 
{
    "latencies0": [
        1.5656654834747314,
        1.5766241550445557,
        1.6996073722839355,
        1.5003728866577148,
        1.5379221439361572,
        1.5791761875152588,
        1.5748462677001953,
        1.6204311847686768,
        1.6203172206878662,
        1.6596550941467285
    ],
    "mean_latency0": 1.593461799621582,
    "duration0": 16.749729871749878,
    "p90-0": 1.6636503219604493,
    "p95-0": 1.6816288471221923,
    "p99-0": 1.696011667251587,
    "duration1": 3.712648868560791,
    "duration": 20.46237874031067
}