Open rhaas80 opened 3 years ago
@rhaas80 , I haven't reproduced this yet, but quickly scanning the code, I think this was the abort I was trying to avoid back when I made the change:
https://github.com/LLNL/scr/blob/ed7b53214123615a07cb4e961957977b46fad0fe/src/scr_param.c#L191
Basically if scr_param_get
is called before the scr_env_hash
has been initialized and if the parameter being queried also happens to be set in the environment, then this code path with the abort gets triggered. Maybe I was trying to query the value, or query a two-level value? I don't remember off hand.
As for reading/writing the file multiple times, I decided that was an acceptable tradeoff since people tend to just set a few params and that only happens once in the run. It's inefficient but not frequent, so this trade off made sense. We can definitely look for a better solution.
Perhaps using a kvtree to cache the environment settings is overkill anyway. Or we can find another way to initialize those kvtrees.
I think the issue came up with a situation where a program does a query:
>>: prog.c
<snip>
const char* val = SCR_Config("SCR_DEBUG")
<snip>
And then at runtime, one has set that parameter via environment variable:
export SCR_DEBUG=1
./prog
At the time, the call to scr_param_get
that was leading to the abort might have come from setting a parameter rather than a query, since SCR_Config
used to call scr_param_get
after setting a parameter around line 2471:
So then a program and run like the following would trigger the abort.
>>: prog.c
<snip>
SCR_Config("SCR_DEBUG=1")
<snip>
export SCR_DEBUG=1
./prog
I mention in the PR that I changed things to always return NULL when setting a parameter, since a strdup was now required in order to return a value. That's true since calling scr_param_finalize
before returning from SCR_Config
deletes the kvtree that holds the string that we're returning. I decided to return NULL instead so users would not have to free returned strings when setting params. However, if we can avoid calling scr_param_finalize
, we could restore the behavior to return the pointer when setting params again.
This pair causes the various parameter hashes and data structures to be created and torn down for each call to
SCR_Config
. Worse, it also triggers the file.scr/app.conf
to be read / written for eachSCR_Config
call.Eg in ed7b53214123615a07cb4e961957977b46fad0fe with an extra printf
executing
test_config
gives:ie 37 file creations. Files are created / read only on rank 0 but still this is not good behaviour.
This seems to have been introduced in 561b4ea5035ef874197b4a25ae5385152320cb43 part of https://github.com/LLNL/scr/pull/216 , which also seems to remove the possibility for
SCR_Config
to return a failure to set a parameter (since it always returnsNULL
when setting values).