Running multiple instances of Timeloop in parallel

NVlabs / timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.

https://timeloop.csail.mit.edu/

BSD 3-Clause "New" or "Revised" License

325 stars 101 forks source link

Running multiple instances of Timeloop in parallel #272

Closed Nerotos closed 2 months ago

Nerotos commented 3 months ago

When running multiple instances of Timeloop in parallel, I get this error:

execute:accelergy evaluation/features/features.8/Conv/eyeriss_like.yaml --oprefix timeloop-mapper. -o ./ > timeloop-mapper.accelergy.log 2>&1
ERROR: key not found: ERT, at line: 0

I suspect that is caused by the timeloop-mapper output file being written by multiple instances of Timeloop at the same time. Is there a way to prevent this? Running everything sequentially would be very slow.

angshuman-parashar commented 3 months ago

This is plausible. We have not tested multiple instances of Timeloop running in parallel.

That said, Timeloop itself is multi-threaded and the number of mapper threads will expand to fill all available host CPUs. Do you need more parallelism beyond that (e.g., if mapper runs are very short, and/or you are only using timeloop-model)?

Nerotos commented 3 months ago

I want to evaluate multiple design choices, like, for example, the Eyeriss example but with some changes to the architecture parameters. And I want to evaluate a full network, e.g. ResNet-50. I know that Timeloop doesn't support cross-layer optimizations, but that is good enough for me. So I have two more options for parallelism, and I want to be able to evaluate multiple architectures in parallel.

angshuman-parashar commented 3 months ago

Understood, but my point is that each Timeloop-mapper invocation already maxes out the parallelism on your host machine in a controlled manner. Adding additional parallel work will only slow things down.

Nerotos commented 3 months ago

Isn't the number of threads limited by a key of the mapper config? If that is the case, it would be interesting to also parallelize in other directions. To be honest, I have no idea if that would be faster than just giving the mapper more threads, but I would like to explore that option. Unless the mapper actually just fills the host machine with threads. Then yeah, won't make sense to do so.

angshuman-parashar commented 3 months ago

The mapper's default behavior is indeed to fill the host machine with threads: https://github.com/NVlabs/timeloop/blob/450af7939ce84b6f577ce3566fa915f50dd4bd8b/src/applications/mapper/mapper.cpp#L145

The key in mapper config is used to override that default behavior (e.g., for debugging, or for limiting host machine utilization).