Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
559 stars 474 forks source link

Parse xbtop/Record FPGA usage #7570

Closed phip123 closed 9 months ago

phip123 commented 1 year ago

Hi,

is there any tool/library that provides the output of xbtop in a machine readable format? I sadly have troubles with the "build with docker" script, otherwise I would simply change the source code.

I figure there must be some sort of monitoring support that is cloud-native (or similar).

Thanks!

dbenusov commented 1 year ago

Hi @phip123,

We have a script compatible with Nagios queries. The output of the nagios script is JSON plus some Nagios flavor text. Running xbutil examine -d <BDF> -r <Report names> -o <Output file> will create a JSON with the specified reports.

If using Nagios is out of the question you can invoke xbutil directly.

xbtop provides the electrical, memory, and dynamic region reports. You could run xbutil examine -d <BDF> -r memory electrical dynamic-regions -o <Output file> and send the output JSON to where it is needed.

Is this what you are looking for? We are always looking on ways to improve!

phip123 commented 1 year ago

Hi,

thanks for the comment. Sadly, I'm going to stick with Prometheus and I'm currently working on an exporter that uses xbutil2.

I'm using the xbutil command you suggested - is there any way to direct the JSON output to stdout instead of a file? Because currently it seems there is no other way than letting xbutil write the JSON file to disk and then read the JSON file, which causes quite some overhead...

Thanks!

dbenusov commented 1 year ago

Hi @phip123,

No worries. Thanks for mentioning Prometheus, looks interesting.

Unfortunately we cannot. The Nagios plugin functions by creating a temporary file and then deleting it. We can get very close by running xbutil examine -d <BDF> -r memory electrical dynamic-regions -o /dev/stdout --force, but, it includes the human readable report which is unfortunate. I'll see if I can add an option to silence the human readable output. Seems like it could be useful.

phip123 commented 1 year ago

Hi @dbenusov-xilinx,

then I'm following the same approach as Nagios, by creating and deleting a temporary file. Thanks, I think that would be nice!

Another, somewhat related question, is it possible that the xbutil command locks some resources? For example on instances with multiple FPGAs installed it seems there is some locking going on and even executing the xbutil command in parallel takes longer than individually.

dbenusov commented 1 year ago

Hi @phip123 Sorry for the delay. The user space code does not use mutexes for any reading operations. It does use mutexes to access the device table which contains device handles, although I doubt that would cause much of a delay. The driver also does have per device locks, but, if different devices are accessed this should not be an issue. From what I gathered, if the requests are done in parallel, it seems the CPU could be the bottleneck.

dbenusov commented 9 months ago

I had some time to finally talk to the team about adding the silence feature and it seems like it is more trouble that it is worth considering the uses. If you really need that feature, please open another issue and I will push it further. Thank you for your feedback!