memory and run count limits in darshan-util

ashwinidr23 commented 11 months ago

Hello,

I am trying to resolve a warning on the html file created by pydarshan. i see a warning "Module data incomplete due to runtime memory or record count limits" on the file. i created a darshan config file to update the max records even upto 800000 and modmem upto 8000. it did increase the file access counts from 1024 to around 5000 and the results are updated. but the error still remains the same. is there anything else i could do to resolve this?

MAX_RECORDS 8000000 POSIX MODMEM 80000

Thank you in advance

shanedsnyder commented 10 months ago

Sure, I have a couple of initial comments:

The settings you have above are on the right track, in that they are instructing Darshan to use a larger memory allocation and to allocate more POSIX records from this memory. There's still one more tunable that might need to be increased though: NAMEMEM. This controls how much memory Darshan reserves for storing the names associated with each record (e.g., file names). If there is no more "name" memory, Darshan will also decide to quit instrumenting (even if there is "module" memory remaining, these 2 memory pools are independent, if that makes sense). I think the default is 1 MiB of name memory, but you could see if 2 or 4 MiB works better as a start?
Note that MODMEM and NAMEMEM are expressed in terms of MiB. So, the above configuration would require Darshan allocating like 80 GiB of memory, which is probably way more than it needs. I realize you were just trying to find a configuration that would avoid the memory issue, but assuming my suggestion above works, I think you could dial these values down considerably. 8-16 MiB of MODMEM should be plenty the vast majority of the time.

ashwinidr23 commented 10 months ago

That worked for me. Thank you :)

darshan-hpc / darshan

memory and run count limits in darshan-util #970