areaDetector / ADAravis

areaDetector driver for GenICam cameras using the Aravis library on Linux.
https://areadetector.github.io/master/ADAravis/ADAravis.html
6 stars 8 forks source link

readEnumChoices leak #18

Closed daykin closed 1 year ago

daykin commented 1 year ago

On a couple of our servers (about 10-15 camera instances each), we noticed memory usage climbing, eventually swapping out and causing high CPU usage. diag-cam-s3-memory-utilization (002)

These servers are currently running the legacy areadetector-aravis driver 3.0, with libaravis 0.6. Valgrind shows the following leak which grows with time:

==3187580== 13,176 bytes in 549 blocks are definitely lost in loss record 12,970 of 13,239
==3187580==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==3187580==    by 0x924AD48: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6600.8)
==3187580==    by 0x5F81A45: arv_gc_enumeration_get_available_int_values (in /usr/lib/x86_64-linux-gnu/libaravis-0.6.so.0.0.0)
==3187580==    by 0x5F8C87B: arv_device_get_available_enumeration_feature_values (in /usr/lib/x86_64-linux-gnu/libaravis-0.6.so.0.0.0)
==3187580==    by 0x48A8320: aravisCamera::runScanner() (aravisCamera.cpp:961)
==3187580==    by 0x48AE981: aravisCamera::FeatureScanner::run() (aravisCamera.cpp:254)
==3187580==    by 0x4DBCBA9: epicsThreadCallEntryPoint (epicsThread.cpp:83)
==3187580==    by 0x4DC5679: start_routine (osdThread.c:403)
==3187580==    by 0x6031EA6: start_thread (pthread_create.c:477)
==3187580==    by 0x5261DEE: clone (clone.S:95)

I just pulled in the latest master versions of this and ADGenICam, and upgraded libaravis to 0.8.6. Now, I get a similar leak which might be growing with WriteInt32 calls (in my brief test, 5200 bytes lost with just starting an IOC, and 6,240 lost if I change some int parameters):

==2232427== 6,240 bytes in 336 blocks are definitely lost in loss record 5,447 of 5,625
==2232427==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==2232427==    by 0x60A0D48: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6600.8)
==2232427==    by 0x5F9E01D: arv_gc_enumeration_dup_available_int_values (in /usr/lib/x86_64-linux-gnu/libaravis-0.8.so.0.8.6)
==2232427==    by 0x4890068: arvFeature::readEnumChoices(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<int, std::allocator<int> >&) (arvFeature.cpp:137)
==2232427==    by 0x5F56EA4: GenICamFeature::read(void*, bool) (GenICamFeature.cpp:418)
==2232427==    by 0x5F57CD6: GenICamFeatureSet::readAll() (GenICamFeature.cpp:616)
==2232427==    by 0x5F5E6F4: ADGenICam::writeInt32(asynUser*, int) (ADGenICam.cpp:137)
==2232427==    by 0x4893B84: ADAravis::writeInt32(asynUser*, int) (ADAravis.cpp:584)
==2232427==    by 0x4B8342C: writeInt32 (asynPortDriver.cpp:1996)
==2232427==    by 0x4B98744: processCallbackOutput (devAsynInt32.c:528)
==2232427==    by 0x4B63F6C: portThread (asynManager.c:913)
==2232427==    by 0x4D96679: start_routine (osdThread.c:403)

I am not sure if this is actually the primary or only cause of dwindling memory. But in any case, I see in arvFeature.cpp:137, we create array gint64 *values, with arv_gc_enumeration_dup_available_string_values. values are then pushed into a vector. Then, another similar array strings is g_free()d, but values is not. Now I get lost, because for reasons not entirely clear to me, if I do g_free(values) and try to acquire images from the camera, the driver just passes along empty buffers:

2023/03/21 14:04:38.743 ADAravis:processBuffer: w: 1920, h: 1200, size: 0, expected_size: 4608000
MarkRivers commented 1 year ago

I don't understand why g_free(values) would cause the buffers to be empty.

If you monitor the memory leak does it scale with the number of images acquired? If you acquire 10 images, look at the memory use, acquire another 10, etc. does it grow at a constant rate?

Note that there was a bug fixed recently in ADGenICam and ADAravis that caused a memory leak if the camera was told to acquire when it was already acquiring. This is fixed in these commits:

https://github.com/areaDetector/ADGenICam/commit/adae668603912b008447499309a3a7f48b3f8951 https://github.com/areaDetector/ADAravis/commit/d9140060b8c42eaaabd2b45a5e6b4e0549044a8f

Are you using a master branch with those commits?

daykin commented 1 year ago

I am using master as of March 17th.

I have just found out that the zero-size thing has nothing to do with this. Rather, it is zero size when I set capture mode to 'multiple' rather than 'continuous'. Not sure what that's about. So I guess that's worth its own issue.

g_free()'ing values fixes all instances of the leak, as I'd expect. values is only used in the scope of readEnumChoices and can safely be freed after use.

MarkRivers commented 1 year ago

Closed via #20