cornell-brg / hb-pytorch

Repo to hold HammerBlade PyTorch port. Based on PyTorch v1.4.0
Other
13 stars 10 forks source link

Silicon could support bsg_time() #130

Open vb000 opened 4 years ago

vb000 commented 4 years ago

Update silicon target to use bsg_time() when it's available. It can probably supported using the fpga on the board.

drichmond commented 4 years ago

What specifically do you need from bsg_time? I deprecated that in favor of the more general API that reads a cycle counter from the manycore hardware

drichmond commented 4 years ago

That API was implemented in this PR https://github.com/bespoke-silicon-group/bsg_replicant/pull/601

vb000 commented 4 years ago

Either bsg_time or hb_mc_manycore_get_cycle... can this supported using on board fpga?

drichmond commented 4 years ago

bsg_time --> no *_get_cycle -> yes

Does the FPGA board (for the silicon) use Leonard's interface? I just added it as a MMIO register

Does the FPGA have access to the Manycore clock?

vb000 commented 4 years ago

I'd have to check with Paul...

(But Manycore's clock is fast and fpga might not be timing closed with manycore's clock?)

drichmond commented 4 years ago

Yeah. If we had a proxy for the manycore clock though, that would be useful (like, clk/10).

Otherwise you can just use a cycle counter on the FPGA a multiply by N to get the manycore cycles (where N is the ratio between core clock and FPGA clock)

drichmond commented 4 years ago

This is the implementation: https://github.com/bespoke-silicon-group/bsg_replicant/blob/ff4704ebe789512cdbcb10094efd9be6f78e82a5/libraries/platforms/aws-fpga/hardware/cl_manycore.sv#L1148

vb000 commented 4 years ago

That make me realize, why can't we just use system time? Silicon time is real right?

taylor-bsg commented 4 years ago

Regarding what we need, did Cornell want cycle count or just a measure of absolute time for their profiling? We can get the latter in the FPGA off of the FPGA clock. The core clock does run out of the chip, but it's too fast for the FPGA to look at, and it also varies with chip execution.

M

On Fri, Jul 10, 2020 at 10:04 AM Bandhav Veluri notifications@github.com wrote:

That make me realize, why can't we just use system time? Silicon time is real right?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cornell-brg/hb-pytorch/issues/130#issuecomment-656782548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5ABKEKSI2W7IVTMAJRLR25CYNANCNFSM4OWYGYFQ .

drichmond commented 4 years ago

Cornell explicitly asked for cycle count (I can send relevant slack messages).

bsg_time is deprecated and is removed in a PR that is under review. It was a hack, and it was only intended for printing debug messages. In simulation, it would print what $time reports, and switched to wallclock time for execution. I added it back when we were writing the runtime libraries so that we would know where to look in the waveforms when the simulations failed.

taylor-bsg commented 4 years ago

Sure, but are they using cycle count to compute kernel execution time?

On Fri, Jul 10, 2020 at 4:16 PM Dustin Richmond notifications@github.com wrote:

Cornell explicitly asked for cycle count (I can send relevant slack messages).

bsg_time is deprecated and is removed in a PR that is under review. It was a hack, and it was only intended for printing debug messages. In simulation, it would print what $time reports, and switched to wallclock time for execution.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cornell-brg/hb-pytorch/issues/130#issuecomment-656934140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5ADSTI6PKJDVH3BAVN3R26OLBANCNFSM4OWYGYFQ .

yodada commented 4 years ago

Yep. Our hope is to get cycles and calculate execution time by assuming a frequency