NVlabs / timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
https://timeloop.csail.mit.edu/
BSD 3-Clause "New" or "Revised" License
340 stars 104 forks source link

Plan to support other architecture configuration #5

Closed sj-leo closed 4 years ago

sj-leo commented 5 years ago

Hi! First of all, thank you for your great works and I think that Timeloop is a useful simulator.

In the ISPASS2019 paper, you compared performance and energy about some architectures like DianNao, NVDLA, and Eyeriss. However, there is only Eyeriss architecture example in ‘config/timeloop’ folder.

Is there any plan to support DianNao, NVLDA configuration file? Also, can Timeloop support systolic array architecture like TPU?

Thank you!

egorchakov commented 5 years ago

Would be very interested in examples of systolic array-based architecture configurations for TImeloop as well. IIUC section VI.A (Tile Analysis) seems to imply it's feasible:

[...] while an empty delta at consecutive time steps between adjacent hardware instances indicates a forwarding opportunity (such as in a systolic array). [...]

shreyas1998 commented 5 years ago

@evgorchakov , I think such forwarding opportunities exist in the Eyeriss architecture as well (I.e for the outputs/partial sums).

angshuman-parashar commented 5 years ago

All - thank you for your interest. We are preparing a couple of configs for release within the next 2-3 days. Systolic array example configs are coming in a later update. Please also walk through the examples in https://github.com/jsemer/timeloop-accelergy-exercises (inside exercises/timeloop). Each exercise has a README.md that will walk you through a series of steps. These should be helpful in building your intuition on how to write your own architecture configurations. Note that these exercises use a newer YAML based config format.

sj-leo commented 4 years ago

@evgorchakov Thank you for letting me know that sentence!

@angshuman-parashar Wow! I really wanted you to reply to my question. Thank you so much for telling me that you support other examples. I think that I will have a better understanding of using Timeloop!

angshuman-parashar commented 4 years ago

Added Chen ASPLOS '14 (DianNao) and Simba (NVDLA-inspired) example configurations. Also, as a reminder, we strongly encourage going through the examples in https://github.com/jsemer/timeloop-accelergy-exercises. Closing this issue, but we will continue to add more configurations over time.

vmiheer commented 3 years ago

Hi @angshuman-parashar,

  1. The deepbench workload you mention in the paper is same as this, am I right?: https://github.com/baidu-research/DeepBench/blob/master/code/kernels ?
  2. Are there preexisting yaml files for problems/constraints for nvdla for various problems already uploaded somewhere? For CNN the constraints already exists in: https://github.com/NVlabs/timeloop/blob/db033564c443c2dfa519427034e7cc2ceece3fda/configs/mapper/simba-chip.cfg#L97 but I was looking GEMM/RNN.
angshuman-parashar commented 3 years ago
  1. Correct.
  2. To be clear: Simba isn't the same architecture as NVDLA, though there are some similarities. And I apologize, we do not have any existing constraints for GEMM/RNN for those architectures.
vmiheer commented 3 years ago

Thanks @angshuman-parashar, I was trying to find the nvdla architecture referenced in Section VIII subsection A of the paper. Looking at nvdla's website: there are few possible configurations listed http://nvdla.org/primer.html#example-area-and-performance-with-nvdla, which one of it was used in the case study, or was it different than those?

angshuman-parashar commented 3 years ago

We haven't released any NVDLA architecture configuration with the Timeloop distribution.

vmiheer commented 3 years ago

Thanks @angshuman-parashar, It would've been great to have timeloop config for the real silicon (and researchers could use timeloop with even more confidence about architectural/mapping modelling) 😊. I wonder if the reasoning is:

  1. Unavailability of exact nvdla configuration in the public domain? (which is okay and then we can never have the config files for the nvdla).
  2. Or just someone needs to write/upload the config files? (and we can expect it sometime). My current understanding from https://github.com/nvdla/sw/blob/v1.2.0-OC/CompilerFeatures.md#networks-verification-report is there are three nvdla configurations, but I don't know for sure which one is used in (https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit) which I suppose is most likely similar to one used in the timeloop-paper (because it's real silicon). Maybe I am completely wrong. 😊

Again, thank you very much, I guess if the reasoning is related to point 1, I'll focus on simba chiplet and stop exploring the avenue for exact nvdla configuration. 😊

angshuman-parashar commented 3 years ago

Since the dataflow is similar to Simba (CK-partitioned), it should be conceivable to tweak the Simba architecture configuration into a desired NVDLA configuration. At this time we don't have any plans to release an official NVDLA configuration as part of the public Timeloop distribution.

isai-roman commented 1 year ago

Hello,

As said by @angshuman-parashar, an example of a systolic array architecture was designed. Unfortunately I could not find it, neither in the repository nor in the docker containers of the tutorials. Could you please let me know where to find it? It would be very helpful to have a good reference example to start with.

Thanks in advance!