NVlabs / timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
https://timeloop.csail.mit.edu/
BSD 3-Clause "New" or "Revised" License
340 stars 104 forks source link

Defining operand precisions #213

Open sjoks93 opened 1 year ago

sjoks93 commented 1 year ago

How do we define the precision (data width) and sparse compressed representation of each operand of a problem, and at different memory levels (e.g compressed output features at higher levels, operands uncompressed in higher levels)?

Is it possible to define them in a flexible architecture, where memory is completely shared, instead of simply defining the datawidth of the storage?

angshuman-parashar commented 1 year ago

That's not possible today. It's a very useful feature though, so if you are interested in contributing I can guide you. Should be a ~3-day project.

siddharth-joshi commented 1 year ago

Hi Angshu, actually, I'd like to contribute to this. It's something on the critical path for our own projects, so I'd be interested. Might take me longer than three days though, just to get up to speed!

angshuman-parashar commented 1 year ago

@siddharth-joshi are you thinking of per-operand precision (data-width) or compression formats?

siddharth-joshi commented 1 year ago

Both, first per-operand precision and then per-operand compression. I think we have the latter somewhat hacked in somewhere, but I can let @pooria-taheri comment on that more

On Mon, Oct 16, 2023 at 13:59 Angshuman Parashar @.***> wrote:

@siddharth-joshi https://github.com/siddharth-joshi are you thinking of per-operand precision (data-width) or compression formats?

— Reply to this email directly, view it on GitHub https://github.com/NVlabs/timeloop/issues/213#issuecomment-1764997554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJT7QZ63R5EAEOQHRBR26TX7VYZHAVCNFSM6AAAAAA3C3EFW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRUHE4TONJVGQ . You are receiving this because you were mentioned.Message ID: @.***>

vkb27 commented 1 year ago

Hi, I have to analyse performance of implementation of different numerical representations, like POSIT. Is this something possible in Timeloop (I'm just starting out on the tool, gone through the tutorials) by working on the source code? If yes, are there any online resources around it?

angshuman-parashar commented 1 year ago

@vkb27 If you specify the number of bits required by each numerical representation you want to evaluate, Timeloop will model the energy cost of moving those bits across the hardware. This works very well for fixed-width data. For variable bit-widths, you may have to provide an average and convince yourself that the resultant modeling is a reasonably accurate proxy of the actual variable-width behavior. You will also have to provide (via Accelergy) the cost of accessing each arithmetic unit and each SRAM structure on your hardware. If you would like to discuss this further I suggest creating a new issue.

angshuman-parashar commented 1 year ago

Both, first per-operand precision and then per-operand compression. I think we have the latter somewhat hacked in somewhere, but I can let @pooria-taheri comment on that more

@poant Could you confirm if per-operand compression format is already supported?

siddharth-joshi commented 1 year ago

Hi Folks, just following up on this?