mp49 commented 2 years ago

Hi,

I've been playing around with pvaDriver today, transporting 0.5MB images between two areaDetector IOCs with and without LZ4 compression.

I saw poor performance with the pvaDriver with even a low frame rate of 100Hz (only 50MB/s), in that I was dropping arrays every few seconds.

Then I made this change in pvaDriver:

diff --git a/pvaDriverApp/src/pvaDriver.cpp b/pvaDriverApp/src/pvaDriver.cpp index 02b5d4a..278c21f 100644 --- a/pvaDriverApp/src/pvaDriver.cpp +++ b/pvaDriverApp/src/pvaDriver.cpp @@ -23,8 +23,8 @@

include

include "pvaDriver.h"

-//#define DEFAULT_REQUEST "record[queueSize=100]field()" -#define DEFAULT_REQUEST "field()" +#define DEFAULT_REQUEST "record[queueSize=100]field()" +//#define DEFAULT_REQUEST "field()"

And that worked wonders. I was able to reliably run at 100Hz, 800Hz and even 1500Hz without dropping frames.

It seems like that driver was used with queueSize=100 at some point but that it was commented out.

I think this parameter could be made configurable as an argument to pvaDriverConfig(). Does that sound reasonable? If so, I can make a pull request and test it.

Matt

MarkRivers commented 2 years ago

Hi @mp49, that is interesting and your proposal for a PR sounds good.

I am puzzled however on some benchmarks I did back in 2017. They are shown in this slide from the EPICS meeting. I seems like I was getting 1.2 GB/s when using 1 pvaDriver. The images were large, so it was only about 100 frames/s. Is this consistent with what you were seeing? My tests were done on a single machine, so the PVA traffic was not going across a physical wire.

mp49 commented 2 years ago

Thanks, I'll work on that PR.

I'll also do more testing next week, with different image sizes and frame rates. I was testing on a RHEL8 VM, which only has 2 cores, and the sim detector IOC and the pvaDriver IOC were running on the same VM.

I couldn't find any information on the 'queueSize' parameter for the request in the PVA documentation, so I'm not fully sure what the default size is, but grepping the source code leads me to think it is only 1 or 2.

MarkRivers commented 2 years ago

I did find a document here that says the default is 2. But this does not look like the location of official documentation.

https://mrkraimer.github.io/website/developerGuide/pvRequest/pvRequest.html (search for the word "queue").

mp49 commented 2 years ago

I'm still running tests and I'll post some results here.

However, I think we can't rely on this code snippet in pvaDriver to tell us how many images we lost:

if(!update->overrunBitSet->isEmpty())
        {
            int overrunCounter;
            getIntegerParam(PVAOverrunCounter, &overrunCounter);
            setIntegerParam(PVAOverrunCounter, overrunCounter + 1);
            callParamCallbacks();
        }

It assumes that we only lost 1 image if the overrunBitSet is not empty, but I think it tells us we lost 1 or more images.

We could use the NTNDArray uniqueId field instead?

MarkRivers commented 2 years ago

I don't think you can use the UniqueID field, since there is no guarantee that the source of the NTNDArrays is sending you all of them, and that they will be in the correct order.

Mark

mp49 commented 2 years ago

That makes sense. In the NDPluginScatter case that you pasted above, the separate pvaDrivers would each only get a sub-set of images. I think I'll just compare the total sent and the total received, and not rely on the overrun counter.

There may be another way. Perhaps the NDPluginPva could make use of the userTag in the timestamp:

time_t dataTimeStamp 2022-06-21 18:19:54.828
long secondsPastEpoch 1655849994 int nanoseconds 827677249 int userTag 0

And then the pvaDriver would always expect that number to increment by 1.

mp49 commented 2 years ago

I've made a PR here: https://github.com/areaDetector/pvaDriver/pull/13

The default is the same as before. This just deals with setting a different queueSize for the PVA request, which improves performance at high frame rates or on machines that are heavily loaded.

For example, with the standard queueSize, on a underpowered VM (2-cores) I run into problems at 100Hz frame rates even for tiny 128x128 UInt8 images, but with a queueSize=100 I can safely run at 700Hz (which is the maximum rate I can generate images on the same VM).

On a more powerful machine (8-cores), using the default queueSize, I only saw a few dropped images (out of several 1000) when running at 1700Hz (the max the simulation driver was able to run at). However when using queueSize=100 I did not see any dropped images.

areaDetector / pvaDriver

Poor performance with default request type queue size #12

include

include "pvaDriver.h"