ECP-VeloC / AXL

Asynchronous Transfer Library
MIT License
2 stars 8 forks source link

add configuration call #82

Closed rhaas80 closed 3 years ago

adammoody commented 3 years ago

Looks good. Thanks, @rhaas80

tonyhutter commented 3 years ago

I feel that making the user pass a kvtree to AXL_Config() to set config values is needlessly complex. It also forces the user to include KVTree in their app's dependencies.

Why not use simple config strings (AXL_Config(id, "key1=value1 key2=value2" ...)) as discussed in https://github.com/ECP-VeloC/AXL/issues/38? This would allow you to set per-ID config values, and have them persist in the state_file if specified. Also, any AXL_Config() code should allow the user to set the config options as many times as they want. I bring this up because this PR only lets them set it once:

    /* only accept configuration options once */
    configured = 1;

Also, please add a description to your PR and all your commits, and please document what the config values are and what they do.

rhaas80 commented 3 years ago

SCR does offer a string based configuration interface to allow for easy passing of user option. I saw the VeloC libraries as lower level building blocks that will be called from a higher level library that will to the option handling.

All user codes already depend (implicitly) on kvtree since AXL does.

Having that library then construct a string to pass options seems error prone since it looses any hope of type checking (not that the fact that kvtree currently stores only strings helps there).

Having said all of this: duplicating the string parsing code from SCR in each VeloC component is certainly easy enough if that is desired.

Right now the check is in place to reduce the chance of setting parameters, which AXL currently can assume to be constant during a run, to change in the middle of a run (at least to the amount that this is possible given that the config calls happen after AXL_Init). Can obviously be changed, and if option strings are used pretty much must be changed otherwise one has to construct one gigantic option string.

rhaas80 commented 3 years ago

Please let me know what the preferred option is and i will be happy to implement it.

rhaas80 commented 3 years ago

Yes, documentation of these options in the user docs is indeed required. Thanks for pointing that out. In the code right now the various #defines correspond 1:1 to existing AXL global variables that have comments (identical in .c and .h files explaining what they do). I am certainly fine triplicating these comments at he location of the #define.

tonyhutter commented 3 years ago

I saw the VeloC libraries as lower level building blocks that will be called from a higher level library that will to the option handling.

I think we should develop AXL under the assumption that it's a standalone library that anyone could use, rather than just assuming that FILO and VeloC would use it. I could see UnifyFS using it for file stage-in, for example.

All user codes already depend (implicitly) on kvtree since AXL does.

What I mean is that this PR would require users to #include <kvtree.h> and link against libkvtree if they wanted to call AXL_Config(). They currently don't have to do this, nor would they have to do it with a string-based AXL_Config().

Having that library then construct a string to pass options seems error prone since it looses any hope of type checking (not that the fact that kvtree currently stores only strings helps there).

AXL would know the value type in a key=value string, so it would be able to sanity check it. Like it would check that file_buf_size is passed a numeric value, for example.

Right now the check is in place to reduce the chance of setting parameters, which AXL currently can assume to be constant during a run, to change in the middle of a run (at least to the amount that this is possible given that the config calls happen after AXL_Init). Can obviously be changed, and if option strings are used pretty much must be changed otherwise one has to construct one gigantic option string.

You wouldn't need to construct a giant string. You could set one or more values per AXL_Config() call. For example, if you wanted to set the debug file_buf_size and mkdir config values, you could do this:

AXL_Config(id, "debug=1");
AXL_Config(id, "file_buf_size=65536 mkdir=1")

or this:

AXL_Config(id, "debug=1");
AXL_Config(id, "file_buf_size=65536");
AXL_Config(id, "mkdir=1");

or this:

AXL_Config(id, "debug=1 file_buf_size=65536 mkdir=1")

Or any other combination you want. The result would be the same.

A string-based config is also nice because it reduces the amount of boilerplate code you have to write (which also makes the code easier to read). For example, you could reduce your test_config.c code from:

#include <kvtree.h>
...

     kvtree* axl_config_values = kvtree_new(); 
...

    /* check AXL configuration settings */
    rc = kvtree_util_set_bytecount(axl_config_values,
                                   AXL_KEY_CONFIG_FILE_BUF_SIZE,
                                   old_axl_file_buf_size + 1);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_bytecount failed (error %d)\n", rc);
        return rc;
    }
    rc = kvtree_util_set_int(axl_config_values, AXL_KEY_CONFIG_DEBUG,
                             !old_axl_debug);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_int failed (error %d)\n", rc);
        return rc;
    }
    rc = kvtree_util_set_int(axl_config_values, AXL_KEY_CONFIG_MKDIR,
                             !old_axl_make_directories);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_int failed (error %d)\n", rc);
        return rc;
    }
    rc = kvtree_util_set_int(axl_config_values, AXL_KEY_CONFIG_COPY_METADATA,
                             !old_axl_copy_metadata);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_int failed (error %d)\n", rc);
        return rc;
    }

    rc = AXL_Config(axl_config_values);

    ...
    kvtree_delete(&axl_config_values);

To:

    char *config;
    asprintf(&config, "file_buf_size=%d debug=%d mkdir=%d copy_metadata=%d",
        old_axl_file_buf_size + 1, !old_axl_debug, !old_axl_make_directories, !old_axl_copy_metadata);
    AXL_Config(config);

I would also propose that AXL_Config() be able to set and retrieve both global and per AXL ID specific values. There are a couple of ways to do this:

  1. Pass an AXL ID value to AXL_Config(). If it's a global value, then pass -1 for the AXL ID:
int id = 42;
AXL_Config(id, "file_buf_size=65535");
AXL_Config(-1, "debug=1");
  1. Have an 'id' key value pair for setting ID specific values:
AXL_Config("id=42 file_buf_size=65535");
AXL_Config("debug=1");

And to retrieve:

char *str = AXL_Config("debug");
printf("debug=%s", str);   // prints "debug=1"

char *str = AXL_Config("id=42 file_buf_size");
printf("file_buf_size=%s", str);   // prints "file_buf_size=65535"
rhaas80 commented 3 years ago

Uff that was a lengthy comment. I guess I will have to redo my implementation.

I think we should develop AXL under the assumption that it's a standalone library that anyone could use, rather than just assuming that FILO and VeloC would use it. I could see UnifyFS using it for file stage-in, for example.

From my point of view (as a science code developer) UnifyFS would still be a layer between AXL and "me" and would (and should do so I would think) hide the fact that it internally uses AXL and translate its own options into whatever AXL expects.

What I mean is that this PR would require users to #include <kvtree.h> and link against libkvtree if they wanted to call AXL_Config(). They currently don't have to do this, nor would they have to do it with a string-based AXL_Config().

Right now one already has to link against kvtree since AXL links against it. One has to explicitly do so since the current CMake file for AXL is not reporting AXL's own dependent libraries as libraries one needs to link against (at least not that I can see and in a similar situation in er with rankstr I certainly had to manually add it). On a sufficiently old system with dynamic linking the linker will automatically link in the dynamic librraries that libAXL.so uses when one links against libAXL.so but for static linking and newer linkers (that default to not setting --copy-dt-needed-entries see https://stackoverflow.com/questions/8725572/gcc-4-5-vs-4-4-linking-with-dependencies) one does need to list them even for a dynamic link.

#include <kvtree.h> is not needed on the user side since axl.h already includes that file. However this requires that include paths are set up so that the code using AXL actually finds kvtree.h which is a downside and could be avoided by forward declaring kvtree ie typedef struct kvtree_struct kvtree;.

Having that library then construct a string to pass options seems error prone since it looses any hope of type checking (not that the fact that kvtree currently stores only strings helps there).

AXL would know the value type in a key=value string, so it would be able to sanity check it. Like it would check that file_buf_size is passed a numeric value, for example.

Let me be more explicit: passing strings pushes type checks to runtime, while using kvtree_set_int gives a warning at runtime. One can get warnings for strings by using something like AXL_Configf that has printf like syntax and using compiler specific attributes to tell the compiler that ACL_Confgf is printf-like.

You wouldn't need to construct a giant string. You could set one or more values per AXL_Config() call. For example, if you wanted to set the debug file_buf_size and mkdir config values, you could do this:

If I look at the discussion in #38 (eg https://github.com/ECP-VeloC/AXL/issues/38#issuecomment-502798719) then this does not seem to be (sanely) possible since constructs like "CKPT=0 SCHEME=XOR" are not expected to set both CKPT to 0 and SCHEME to XOR but instead set SCHEME to XOR in the 0 sub-option of CHKPT. Admittedly none of the VeloC compoments actually uses anything like this (only SCR does).

A string-based config is also nice because it reduces the amount of boilerplate code you have to write (which also makes the code easier to read). For example, you could reduce your test_config.c code from:

#include <kvtree.h>
...

     kvtree* axl_config_values = kvtree_new(); 
...

    /* check AXL configuration settings */
    rc = kvtree_util_set_bytecount(axl_config_values,
                                   AXL_KEY_CONFIG_FILE_BUF_SIZE,
                                   old_axl_file_buf_size + 1);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_bytecount failed (error %d)\n", rc);
        return rc;
    }
    rc = kvtree_util_set_int(axl_config_values, AXL_KEY_CONFIG_DEBUG,
                             !old_axl_debug);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_int failed (error %d)\n", rc);
        return rc;
    }
    rc = kvtree_util_set_int(axl_config_values, AXL_KEY_CONFIG_MKDIR,
                             !old_axl_make_directories);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_int failed (error %d)\n", rc);
        return rc;
    }
    rc = kvtree_util_set_int(axl_config_values, AXL_KEY_CONFIG_COPY_METADATA,
                             !old_axl_copy_metadata);
    if (rc != KVTREE_SUCCESS) {
        printf("kvtree_util_set_int failed (error %d)\n", rc);
        return rc;
    }

    rc = AXL_Config(axl_config_values);

    ...
    kvtree_delete(&axl_config_values);

To:


  char *config;
  asprintf(&config, "file_buf_size=%d debug=%d mkdir=%d copy_metadata=%d",
      old_axl_file_buf_size + 1, !old_axl_debug, !old_axl_make_directories, !old_axl_copy_metadata);
  AXL_Config(config);
I don't know. This is lacking a `free` on `config`. It does not check asprintf for failure. This seems to mostly safe on boilerplate by removing error checks. With full error checks with strings the code and trying to avoid typo errors in the string names of the variables would look like:
char *config;

rc = asprintf(&config, AXL_CONFIG_KEY_FILE_BUF_SIZE"=%lu "AXL_CONFIG_KEY_DEBUG"=%d "AXL_CONFIG_KEY_MKDIR"=%d "AXL_CONFIG_KEY_COPY_METADATA"=%d",
  (unsigned long) old_axl_file_buf_size + 1, !old_axl_debug, !old_axl_make_directories, !old_axl_copy_metadata);
if(rc == -1) {
  printf("asptrinf failed (error %d)\n", rc);
  return 1;
}
rc = AXL_Config(config);
if(rc != AXL_SUCCESS) {
  return 1;
}
free(config);

which is still quite a bit shorter since it combines multiple kvtree_set_XXX calls into a single asprintf.

I would also propose that AXL_Config() be able to set and retrieve both global and per AXL ID specific values. There are a couple of ways to do this:

sure.

I will implement a demo setup and let you have a look.

tonyhutter commented 3 years ago

Thanks for the info. Since we decided to go with the kvtree-approach in today's meeting, you can ignore my previous comments, and we should probably update https://github.com/ECP-VeloC/AXL/issues/38 to say we're using a KVTree.

As far as per-AXL ID vs global values go, may I suggest we mix both globals and ID specific values in one tree, with the ID specific ones in a subtree, like:

// psudocode: set global debug and some ID specific values
kvtree config = {
    debug = 1
    kvtree {
        id = 42
        file_buf_size = 65536
    }
    kvtree {
        id = 43
        file_buf_size = 2097152
        compression = 1
    }
}

AXL_Config(config)

I think there's even a way you could index the subtree by id if you wanted, but I can't remember off the top of my head.

And for reading config values back, its probably easiest to have a way to return all the config values (global and ID-specific), and not worry about querying individual ones. You could extend AXL_Config() to do that:

/*
 * Get/set AXL configuration values.
 *
 * config:  The new configuration.  Global variables are in top level of
 *              the tree, and per-ID values are subtrees.  If config=NULL,
 *              then return a kvtree with all the configuration values (globals
 *              and all per-ID trees).
 *
 * Return value:    If config != NULL, then return config on success.  If
 *                      config=NULL (you're querying the config) then return
 *                      a new kvtree on success.  Return NULL on any failures.
 */
kvtree * AXL_Config(kvtree *config)

For example:

kvtree *old_config;

old_config = AXL_Config(NULL);
if (!old_config)
    printf("error\n");
// read back our config values into old_config

In fact, you could have return not only the config values, but all kvtree values for a particular ID too (like AXL_KEY_STATUS, AXL_KEY_STATE, ... etc).

rhaas80 commented 3 years ago

Keeping track of things, @tonyhutter pointed out https://github.com/ECP-VeloC/AXL/pull/82#issuecomment-689742336 with the choice of what to return for setting (the input config) and getting (a full tree of all settings).

tonyhutter commented 3 years ago

Please check that all indents are four spaces, and also add per-ID config options.

tonyhutter commented 3 years ago

Please also squash your commits

tonyhutter commented 3 years ago

Double check that your indents are all four spaces.

rhaas80 commented 3 years ago

Double check that your indents are all four spaces.

Will do.

I believe i asked this before: is there an actual document (or even better, a clang format file and an offical clang version to use) for the desired coding style? Right now SCR uses 2 spaces, AXL spaces 4 and there may well be subtle other differences.

tonyhutter commented 3 years ago

Yea, the inconsistency is annoying, and no, there's no style docs. We've talked a little about enforcing style in Travis: https://github.com/ECP-VeloC/AXL/pull/77#issuecomment-678553793 but haven't done it.

rhaas80 commented 3 years ago

Based on the discussion in today's call I verified that with AXL at rhaas/keyvalue (ie this pull request) and all other VeloC components on master and SCR on develop, no more tests fail on my workstation than with AXL on master (some tests fail due to they assuming SLURM and / or multiple nodes).