ECP-VeloC / AXL

Asynchronous Transfer Library
MIT License
2 stars 8 forks source link

AXL_Config #38

Open gonsie opened 5 years ago

gonsie commented 5 years ago

We should have a config function to set some options... including the number of pthead threads.

tonyhutter commented 5 years ago

Agreed. An AXL_Config(id, key, value) function could write directly into a 'config' KVtree that hangs off our main kvtree.

tonyhutter commented 5 years ago

So I'm currently writing some "cancel a transfer" test cases, and I need a way to set config values in AXL. Specifically, I need a 'file_delay' config value that means "in AXL_Dispatch, wait N milliseconds before copying the next file". The idea is that I can deterministically control how long my total transfer takes, so that I can call AXL_Cancel at the right time in the transfer. I'm running into problems where my transfer is finishing before I have time to cancel it.

Here's a strawman prototype for AXL_Config:

/* 
 * Get/set a AXL configuration key
 *
 * This function allows you to get and set AXL configuration key values.
 *
 * key:         The configuration key you're looking up
 * set_value:   If you're setting a value, put your value here.  If you're
 *              getting a value, leave this NULL.
 *
 * On success, this will return the value you're getting or setting.  On
 * failure it will return NULL.  All values are strings.
 */
char *
AXL_Config (int id, char *key, char *set_value)

This will treat all configuration values as strings, which simplifies things a lot. Yes, you'll need to convert numerical values to strings first, but I don't think that will be a big deal, as the number may already be a string anyway (like if you pass in a 'file_delay' value in argv[] to axl_cp).

Some config keys we might want:

file_delay [milliseconds] - Delay N milliseconds between file transfers. num_threads [number of threads] - Number of threads to use for a transfer (applicable to pthreads, and whatever other xfer types we make multithreaded) compression [on|off] - Tar up all the files before transfer and decompress them at the desintation. This could be useful for node to node transfers, since scp'ing individual files can be slow.

Thoughts?

tonyhutter commented 5 years ago

Also, should the AXL_Config values be saved to the statefile?

gonsie commented 5 years ago

Yes, I would save these setting to the state file.

The AXL_Config prototype looks great to me. And I think that all strings are fine. I would make the return code AXL_SUCCESS or FAILURE, not null or the value (I think you mean this, just the comment wording isn't clear).

tonyhutter commented 5 years ago

I'd prefer it to return NULL or the value. That way you can call the function directly, like: printf("file_delay=%s", AXL_Config(id, "file_delay", NULL));

If you return AXL_SUCCESS/FAULURE, you'll need to provide an additional char** to store the key's value into, like:

int AXL_Config (int id, char *key, char *set_value, char **get_value)

It's just a little more awkward to use since you'll need an additional get_value variable to store the result.

gonsie commented 5 years ago

Okay, that makes sense. So, to get the value, the set_value param is set to NULL? would we ever want to actually set a variable to NULL?

tonyhutter commented 5 years ago

That's a good point. We could say that setting it to "" clears it:

AXL_Config(id, "file_delay", "");

In that case set_value would be non-NULL, and set_value[0] would be '\0'. If you tried to get a value that had been set to "", it would be a special case, and return NULL. I'm open to other ideas though.

gonsie commented 5 years ago

As a thought exercise, I'd like to see if the SCR configs could be handled by a similar interface. There may be good reasons why SCR might need a different interface, but maybe not.

The SCR repo includes some examples of configurations in its system config and user config templates.

Could these be set via this interface? My initial thought is yes. Some configs come in groups (such as the settings of a cache directory). But, using multiple calls to _Config, that could achieved.

@tonyhutter what other settings have we thought about making configurable? I know you've mentioned a few in other issues and PRs. I want to think through having those work with this interface as well.

tonyhutter commented 5 years ago

You could have all the file attribute flags as configurables. So like xaddrs=on preserve_timestamps=on.

Most of the things we'd want to configure would be analogues of the the cp options (http://man7.org/linux/man-pages/man1/cp.1.html). In fact, it would probably be a good idea to name the configs exactly the same as cp command line options do. That would give users a good idea what to expect. It would also be easy for us to write test cases for, since we could verify the AXL behaviour against the equivalent cp option.

gonsie commented 5 years ago

ping @adammoody

gonsie commented 5 years ago

Another thought I had: do we care about versioning? What if a user tries to set a configuration that doesn't exist in this version of the library? but maybe that only matters if we are reading a particular version of a config file... not calling config via the runtime.

adammoody commented 5 years ago

This all sounds good to me.

To answer @gonsie 's question about how this relates to SCR configuration, as you mention in SCR we do have cases of nested configurations, where a given config item has multiple child key/value pairs associated with it. In our SCR config files, we list this nesting on a single line with the first key/value on the line serving as a parent to the remaining key/value entries, which are separated by spaces. We could get away with that on a single line since we only have two layers of nesting.

CKPT=0  SCHEME=XOR  STORE=/dev/shm  SIZE=8  INTERVAL=1
CKPT=1  SCHEME=PARTNER  STORE=/ssd   INTERVAL=10

A more traditional way might be to use indentation to indicate nesting, and if we did that in SCR, one might have something like the following to define multiple redundancy schemes for a single run:

CKPT=0
  SCHEME=XOR
  STORE=/dev/shm
  SIZE=8
  INTERVAL=1

CKPT=1
  SCHEME=PARTNER
  STORE=/ssd
  INTERVAL=10

I don't know if we have this nesting issue showing up in AXL yet, so it might be overkill to worry about it at this point. I also don't know of a good API to cleanly express the nesting -- we could let the user pass in a kvtree I suppose.

tonyhutter commented 5 years ago

You could do it like this:

AXL_Config(int id, char *config_string)

Set:

AXL_Config(id, "num_threads=4")
AXL_Config(id, "CKPT=0 SCHEME=XOR")
AXL_Config(id, "CKPT=1 SCHEME=PARTNER STORE=/ssd INTERVAL=10")

Get:

AXL_Config(id, "num_threads")       // returns "4"
AXL_Config(id, "CKPT=1 SCHEME")     // returns "XOR"        
AXL_Config(id, "CKPT=1 STORE")      // returns "/ssd"       
adammoody commented 5 years ago

Ah, yes. Good idea. Works for me.

tonyhutter commented 5 years ago

@adammoody how does SCR config work with respect to config options it doesn't recognize? Does it just ignore them? Store them in the KVTREE but do nothing? Error out?

adammoody commented 5 years ago

It should be storing them in the tree, but nothing processes them. We don't have any error checking in there right now looking for valid key names. Having said that, adding some checks would help users find/fix typos they may have made.

rhaas80 commented 5 years ago

This may be a bit beyond the scope of the original question on how to implement an access API for the settings: Is the format of the files and the tree already set in stone? Or would something like YAML have been an option, nesting would look something like this:

checkpoints:
  - CHKPT: 0
    SCHEME:  XOR
  - CHKPT: 1
    SCHEME: PARTNER
    STORE: /ssd
    INTERVAL: 10

parsing (and tree population) via libyaml (https://pyyaml.org/wiki/LibYAML) but the access API (in C a least) ends up a bit more cumbersome since one has to return partial keys for users to manually search through or implement a query API à la "get me the 'checkpoints' sequence member where the 'CHKPT' scalar has value '0'" which is similar to the current proposed API (other than the current proposed one).

Even better would be something like libconfig (https://hyperrealm.github.io/libconfig/) with a file syntax very similar to YAML but somewhat higher level interface.

gonsie commented 5 years ago

There is no file format now, all the settings are set explicitly (either during config time or by the caller requesting a transfer).

I think we'd have to go through through the code to figure out what configuration options can be set. Similar to what we have in SCR, it would be useful to configure different STORE types that have certain properties. Maybe a config file would look like:

STORE_SOURCE=/ssd  STORE_DEST=/gpfs NATIVE=BB_API
NUM_THREADS=4

Keeping a line of text in the config file that has a group of information (similar to SCR) would be my vote. We'll have to figure out later what is valid for each line. 😐

adammoody commented 4 years ago

You could do it like this:

AXL_Config(int id, char *config_string)

Set:

AXL_Config(id, "num_threads=4")
AXL_Config(id, "CKPT=0 SCHEME=XOR")
AXL_Config(id, "CKPT=1 SCHEME=PARTNER STORE=/ssd INTERVAL=10")

Get:

AXL_Config(id, "num_threads")     // returns "4"
AXL_Config(id, "CKPT=1 SCHEME")       // returns "XOR"        
AXL_Config(id, "CKPT=1 STORE")        // returns "/ssd"       

@rhaas80 implemented this type of set/query interface in SCR_Config. We can start from that code if we later decide we want to use strings to configure the components.