ANNIEsoft / ToolAnalysis

Other
10 stars 53 forks source link

SaveConfigInfo and ReadConfigInfo #236

Closed JohannMartyn closed 1 year ago

JohannMartyn commented 1 year ago

SaveConfigInfo saves the git hash of the ToolAnalysis folder and a list of configuration files in the ANNIEEvent->Header. It is saved as a std::string with the key value "ConfigInfo", for each partfile of a run. Some slides can be found here: https://annie-docdb.fnal.gov/cgi-bin/sso/ShowDocument?docid=5175 The ToolChain configfiles/DataDecoderwConfigInfo includes the SaveConfigInfo Tool in the DataDecoder.

marc1uk commented 1 year ago

Alright, this is a good start, and thanks for validating that it works, but I think I have a couple of comments :sweat_smile: :bow:

To give initial feedback on the summary slide in your linked talk:

As for the code:

Regarding the collection of config files, i have a couple of niggles here too....

JohannMartyn commented 1 year ago

I have implemented the comments: 1) The hash is now read from the Makefile. 2) The SaveConfigInfo now takes only a single file as an argument: the ToolChainConfig file. From there the Tools_File file and subsequently all ConfigFiles are read. 3) ConfigFiles can themselves have access to .txt, .csv files and so on. The option FullDepth 1 means that all these files are also saved into the ConfigInfo string in the ANNIEEvent header. The only additional condition to save these files is that they can be opened with an ifstream ( if( infile.is_open() ) ). This works also recursively for indefinite depths, should these secondary files also link to other .txt, .csv files and so on.

marc1uk commented 1 year ago

hi johann. Looking good, thanks for the updates.

I think we're nearly there - the main thing is the requirement to pass the ToolChainConfig file to this tool. As I mentioned in my last comment, the application already has this in argv[1], so this redundancy just makes the Tool higher maintenance and opens the possibility of user error if the config file variable given isn't incorrect.

Other remarks: firstly apologies for our ignorance, but is it possible to stick to english in your comments? :sweat_smile: You'll have to forgive us lowly monolingual colleagues.

i'm not sure exactly what is meant by this comment (google translate isn't great), but it does look like you're not ignoring commented out Tools, which I think is a problem. It'd be easy enough to add some safety checks at line 72

if(line.empty()) continue;
char firstchar='a';
for(char& achar : line){
  if(std::isspace(achar)) continue;
  firstchar=achar;
  break;
}
if(firstchar=='#') continue;

which will skip blank lines and any lines where the first non-whitespace character is # (i.e. comment lines). In fact your parsing of files generally doesn't handle cases of users using tabs or multiple delimeters; as someone with an awareness of whitespace correctness, i can tell you most people have no respect at all for the type or amount of whitespace they insert. This could result in munged names and the Tool failing to find them (ifstream::open(" file.txt") will not find file.txt due to the leading whitespace). A more robust parser would use a stringstream:

std::string uniquename, toolname, configfile;
while(getline(file, line)){
  std::stringstream ss(line);
  if(!(ss >> uniquename >> toolname >> configfile)) continue;
  if(uniquename[0]!='#') vec_configfiles.push_back(configfile);
}

which will be insensitive to the style (space or tab) and number of whitespace separators, and as a bonus rolls in a check for empty and commented out lines (which can safely check uniquename[0] as the preceding ss return value check ensures uniquename is not empty, and any leading whitespace gets stripped by the streamer).

sorry to be bouncing it back again, but i think these should be pretty quick fixes.

JohannMartyn commented 1 year ago

I have implemented the comments:

1) argv[1] can only be accessed in the main.cpp. Used instead the vars Store to get the path of the Tools_File (m_data->vars.Get("Tools_File", line);) to avoid reading in the file path manually in the configfile.

2) Included the use of stringstream to avoid the potential whitespace issues. Has been tested and works.

3) The german comment was just left over and not of importance; I simply forgot to delete it. The comments in the config files are taken correctly into account. Has also been tested.

marc1uk commented 1 year ago

Would you mind adding a check to skip commented out tools by checking uniquename[0]!='#' here? And if you wouldn't mind rebasing so it can run the CI, thanks.