Understand execution speed on Intel boards for HLS4ML

fermilab-accelerator-ai / meetings

Meeting notes and information

2 stars 0 forks source link

Understand execution speed on Intel boards for HLS4ML #25

Open gnperdue opened 4 years ago

gnperdue commented 4 years ago

Currently models run slower on Intel (3 layer at 50 clocks vs 10 on Xilinx). Should eventually understand why.

gnperdue commented 4 years ago

Status:

Christian: Been working to see how large a model the Intel
tools can work with 5 layer, Fully connected, 100, 200, 500
nodes in intermediate layers. 500: choked quickly. 1 day then
ok for 100, also worked ok (took a few days) for 200-node
intermediate. Seems to be filling up the chip with LUTs.
Follow-ups: For this inherited code, not automatically hooked
up to evaluate models from Keras, but want to run them to
evaluate effectiveness of implemented model. Also looking
at scaling of resource usage for simple 1-layer model.

gnperdue commented 4 years ago

Status:

I’ve been working on improving model performance using the Intel tools (reducing
component usage, latency for a given model).  I’ve made some improvements (and
done a number of comparisons to results with the Xilinx tools), but the next step is
to talk to some of the Intel experts.  I gave an update on the status in an HLS4ML 
'working meeting' on Friday.

In terms of who to talk to from Intel, we have some contacts through CERN that
others have communicated with in the past, but Nhan also just met someone based
on Chicago while he was at SC19  that we were hoping to arrange a face-to-face with
at FNAL.  We’ll email him this coming week to touch base and try to get the ball rolling. 
(Nhan and I were also going to meet with Brian on Tuesday to go over results and
discussing setting up a meeting w/ the Intel guy.)

therwig commented 4 years ago

Update: Made contact with local Intel experts after an introduction w/ Nhan at SC19. Had a first (virtual) meeting to discuss the Accelerator AI project in broad strokes and introduce the challenges we've faced so far in effectively porting hls4ml to QuartusHLS. Followed up since then and shared a code repository with implementation of a basic model in quartus (https://github.com/therwig/TestQuartusHLS). Intel contact will have a look and share tips on reducing the latency / shifting compute to utilize DSPs.

gnperdue commented 4 years ago

ongoing saga with licenses for Intel toolkit (compiler piece seems to be okay but validation simulation component, "modelsim", does not have a powerful enough license)
some potential to use Mentor Catapult HLS tools targeting Intel hardware (so, 3rd party solution) - looks promising so far