Closed JohannMartyn closed 1 year ago
Alright, this is a good start, and thanks for validating that it works, but I think I have a couple of comments :sweat_smile: :bow:
To give initial feedback on the summary slide in your linked talk:
As for the code:
Makefile:
GIT_VERSION := $(git rev-parse HEAD)
CFLAGS += -DVERSION=\"$(GIT_VERSION)\"
and then in your code the compile-time defined constant VERSION
will hold the git hash at the time of build:
SaveConfigInfo.cpp:36
buffer << VERSION;
no system call needed. :)
Regarding the collection of config files, i have a couple of niggles here too....
firstly, the set of config files that get saved are manually defined in your SaveConfigInfo Tool's configuration file. That means if someone adds a new Tool to the ToolChain, they need to remember to also add its configuration to your list of configuration files to save. Or if a Tool is removed from the ToolChain, you could end up saving a configuration for a Tool which wasn't actually run; a potentially misleading situation. Finally, someone might just change the ToolsConfig file, so that a Tool uses a different config file; but unless they also update the SaveConfigInfo config file appropriately, you'll end up saving one configuration but using another... not good.
Of course, the executable needs to know which configuration files to read, and those are defined in the ToolsConfig file, which is defined in the ToolChainConfig file, which is passed to the executable when you run it.
So: perhaps a better way would be to
-- get the ToolChainConfig file (available as argv[1]
, as standard for the first argument to a c++ executable), and parse it for the line defining the ToolsConfig
file
-- get the ToolsConfig file and parse it for the set of Tools being run (skipping blank lines or comment lines leading with #
).
-- each line in the ToolsConfig has the format <unique_name> <Tool Name> <config file>
- so parse that for the configuration files being used by each Tool
-- you can then save the contents of those files, which will always by definition be the correct and complete set
the second niggle occurred to me looking at your example printout on slide 5. The first configuration file is:
verbosity 0
TriggerMaskFile ./configfiles/DataDecoder/DefaultTriggerMask.txt
...
This immediately poses the question: what was in the DefaultTriggerMask.txt
file...?
Actually it becomes a real problem when configuration files reference other files, as there's no longer a standard way to identify and record the full configuration used. You might think of looking for configuration file lines ending in .txt
,... but what if someone uses .csv
? or .dat
? or what if they don't give an extension? or what if it's a ROOT file?
This gets real sticky, and I'm afraid I'm not sure what to suggest. For the time being I guess just recording the Tool configuration files is sufficient, but perhaps it's something the software/analysis group can discuss at large....
I have implemented the comments: 1) The hash is now read from the Makefile. 2) The SaveConfigInfo now takes only a single file as an argument: the ToolChainConfig file. From there the Tools_File file and subsequently all ConfigFiles are read. 3) ConfigFiles can themselves have access to .txt, .csv files and so on. The option FullDepth 1 means that all these files are also saved into the ConfigInfo string in the ANNIEEvent header. The only additional condition to save these files is that they can be opened with an ifstream ( if( infile.is_open() ) ). This works also recursively for indefinite depths, should these secondary files also link to other .txt, .csv files and so on.
hi johann. Looking good, thanks for the updates.
I think we're nearly there - the main thing is the requirement to pass the ToolChainConfig file to this tool. As I mentioned in my last comment, the application already has this in argv[1]
, so this redundancy just makes the Tool higher maintenance and opens the possibility of user error if the config file variable given isn't incorrect.
Other remarks: firstly apologies for our ignorance, but is it possible to stick to english in your comments? :sweat_smile: You'll have to forgive us lowly monolingual colleagues.
i'm not sure exactly what is meant by this comment (google translate isn't great), but it does look like you're not ignoring commented out Tools, which I think is a problem. It'd be easy enough to add some safety checks at line 72
if(line.empty()) continue;
char firstchar='a';
for(char& achar : line){
if(std::isspace(achar)) continue;
firstchar=achar;
break;
}
if(firstchar=='#') continue;
which will skip blank lines and any lines where the first non-whitespace character is #
(i.e. comment lines).
In fact your parsing of files generally doesn't handle cases of users using tabs or multiple delimeters; as someone with an awareness of whitespace correctness, i can tell you most people have no respect at all for the type or amount of whitespace they insert. This could result in munged names and the Tool failing to find them (ifstream::open(" file.txt")
will not find file.txt
due to the leading whitespace). A more robust parser would use a stringstream:
std::string uniquename, toolname, configfile;
while(getline(file, line)){
std::stringstream ss(line);
if(!(ss >> uniquename >> toolname >> configfile)) continue;
if(uniquename[0]!='#') vec_configfiles.push_back(configfile);
}
which will be insensitive to the style (space or tab) and number of whitespace separators, and as a bonus rolls in a check for empty and commented out lines (which can safely check uniquename[0]
as the preceding ss
return value check ensures uniquename
is not empty, and any leading whitespace gets stripped by the streamer).
sorry to be bouncing it back again, but i think these should be pretty quick fixes.
I have implemented the comments:
1) argv[1] can only be accessed in the main.cpp. Used instead the vars Store to get the path of the Tools_File (m_data->vars.Get("Tools_File", line);) to avoid reading in the file path manually in the configfile.
2) Included the use of stringstream to avoid the potential whitespace issues. Has been tested and works.
3) The german comment was just left over and not of importance; I simply forgot to delete it. The comments in the config files are taken correctly into account. Has also been tested.
SaveConfigInfo saves the git hash of the ToolAnalysis folder and a list of configuration files in the ANNIEEvent->Header. It is saved as a std::string with the key value "ConfigInfo", for each partfile of a run. Some slides can be found here: https://annie-docdb.fnal.gov/cgi-bin/sso/ShowDocument?docid=5175 The ToolChain configfiles/DataDecoderwConfigInfo includes the SaveConfigInfo Tool in the DataDecoder.