JeffersonLab / analyzer

HallA C++ Analyzer
BSD 3-Clause "New" or "Revised" License
7 stars 54 forks source link

Coda file search #140

Closed sawatzky closed 6 years ago

sawatzky commented 6 years ago

Add new THaRun constructor that can find a file in a path list

hansenjo commented 6 years ago

Hi Brad,

in principle having this constructor is a good idea as it saves quite a bit of boilerplate code in scripts. However, I'll be a bit picky. In this day and age, I would strongly prefer std::strings over C-strings. The list of paths should be a std::vector. In C++11, the vector can be initialized just like the array of C-strings except for the trailing NULL, e.g.

vector<string> paths [=] { "path1", "path2", "path3" };

(the "=" is optional).

Secondly, I'm not sure if passing a printf-style format string into the constructor is a good idea. Aside from not being safe, the required format is basically fixed by the code in the constructor, so there is no flexibility to handle more complex file name patterns. Therefore, I would simply pass a complete file name (not a pattern/run number pair) to the function (as a std::string). The calling code would then need one extra line like

string filename( Form("pattern-prefix-%d-suffix",runno) );

with the advantage that now everyone knows exactly what's going on (and how to change it) without having to look at the constructor source code.

If you want to give users even more convenience, you could provice a static member function that builds a typical file name string for you, e.g.

string MakeFileName( const string& prefix, const string& suffix, int runno, int ndigits = 0 );

and the call to the new THaRun constructor could then look like

auto run = new THaRun( paths, THaRun::MakeFileName("shms_all_",".dat",1255,5) );

where, for example, ndigits = 0 could mean no padding with leading zeros, and ndigits > 0 will pad the run number with leading zeros if necessary to ensure that the run number has at least given number of digits.

This is how I would implement this feature. Suggestions welcome.

Ole

sawatzky commented 6 years ago

On Fri, 15 Sep 2017, Ole Hansen wrote:

in principle having this constructor is a good idea as it saves quite a bit of boilerplate code in scripts. However, I'll be a bit picky. In this day and age, I would strongly prefer std::strings over C-strings. The list of paths should be a std::vector. In C++11, the vector can be initialized just like the array of C-strings except for the trailing NULL, e.g.

vector<string> paths [=] { "path1", "path2", "path3" };

Yeah, I totally agree in principle. That was my initial implementation, but I found it doesn't work for pre-C++11 compilers and the various workarounds seemed to be pretty kludgey (even worse than the c-string option)...

Is there a clean technique that will work with older compilers?

string filename( Form("pattern-prefix-%d-suffix",runno) );

That is definitely better. I'll update the pull request with that change.

The MakeFileName option could be a future enhancement, but I think a calling pattern like this is pretty transparent:

THaRun* run = new THaRun(
      pathList,
      Form("pattern-prefix-%d-suffix",runno)
    );
hansenjo commented 6 years ago

vector<string> paths [=] { "path1", "path2", "path3" };

Yeah, I totally agree in principle. That was my initial implementation, but I found it doesn't work for pre-C++11 compilers and the various workarounds seemed to be pretty kludgey (even worse than the c-string option)...

Is there a clean technique that will work with older compilers?

In C++98/03, you can do

const char* cpaths[] = { "path1", "path2", "path3" }; vector paths(cpaths, cpaths+3);

or, a little more verbose, but more maintainable

vector paths(cpaths, cpaths+sizeof(cpaths)/sizeof(char*));

Less elegant, but not too kludgey either, I think. Haven't tested this with ROOT5's CINT, though.

Ole

sawatzky commented 6 years ago

I updated the branch so you just pass a pre-formed filename rather than a template + run number. Much better. Updated calling convention in a replay script would be something like this:

  const char* RunFileNamePattern = "shms_all_%05d.dat";
  THaRun* run = new THaRun(
      pathList,
      Form(RunFileNamePattern, runno)
    );

Not sold on the vector vs. c-string array though... I debated the vector options you mentioned and concluded that the c-string array was still a readability win over the pre-C++11 vector hacks. If someone drops the null from the array, and the file doesn't exist in any of the preceding paths, then the worst that is likely to happen is that the code hits the 100 loop max safety hatch and dies in the usual way. Meh. :-)

My vote would be to go stick with the c-string arrays until we/JLab are in a position to drop the older compiler dependencies altogether. (I did wince when I made the c-string array though -- I sympathize.)

hansenjo commented 6 years ago

By "older compiler dependencies", do you mean ROOT 5/CINT?

I just checked. CINT does not support the vector(iterator first, iterator last) constructor. So yes, if we need to support ROOT5, we should have a C-string-based constructor.

Actually, why not provide both? Anyone versed in C++11 and using ROOT6 will prefer the vector version.

BTW, I think you are too optimistic regarding the "100 loop max safety hatch." Forgetting the terminating NULL element will cause "undefined behavior." I.e. anything could happen, and could happen unpredictably and intermittently: normal exit, segfault, infinite loop, etc. Not something I'd trust our users with if avoidable.

Ole

sawatzky commented 6 years ago

On Sat, 16 Sep 2017, Ole Hansen wrote:

By "older compiler dependencies", do you mean ROOT 5/CINT?

We still have gcc 4.4.x compilers kicking around -- I didn't think they supported C++11. But ROOT5/CINT is still in the mix too.

Anyway, you talked me into it, I stripped the c-string option and will just do this in the replay scripts for 'legacy' setups: vector pathList; pathList.push_back("."); pathList.push_back("./raw"); pathList.push_back("./cache");

A little verbose, but still readable and is probably better overall.

BTW, I think you are too optimistic regarding the "100 loop max safety

Not optimistic -- just jaded :-) Anyway, no longer an issue.

-- Brad

-- Brad Sawatzky, PhD brads@jlab.org -<>- Jefferson Lab / Hall C / C111 Ph: 757-269-5947 -<>- Fax: 757-269-5235 -<>- Pager: brads-page@jlab.org The most exciting phrase to hear in science, the one that heralds new discoveries, is not "Eureka!" but "That's funny..." -- Isaac Asimov