LDMX-Software / ldmx-sw

The Light Dark Matter eXperiment simulation and reconstruction framework.
https://ldmx-software.github.io
GNU General Public License v3.0
21 stars 19 forks source link

Friendliness of Histogram pooling #1381

Open tomeichlersmith opened 1 month ago

tomeichlersmith commented 1 month ago

Since filling histograms is a common task especially for new developers of ldmx-sw, I think focusing on improving its friendliness is helpful. First and foremost, the error being thrown from get is hard to parse for people new to ldmx-sw and ROOT files, so I'd like to reword this to help more.

https://github.com/LDMX-Software/ldmx-sw/blob/613a9d10b87235eb4ad3a6d3fc8229752692ad98/Framework/src/Framework/Histograms.cxx#L46-L53

Necessity of Singleton Pool

I can't remember why we need a singleton pool. Can we get away with a pool for each processor? This would allow us to avoid adding the processor's name as a prefix to the histogram name (I think) which is another source of confusion/annoyance. The goal here would get to a point where the histogram file could look like

hist.root:
  analyzer1/
    histogram
  analyzer2/
    histogram

so both analyzer1 and analyzer2 could share histogram names but the two histograms are distinct both in memory and in the output hist.root file.

Creation Helpers for Categorical Axes

I've written some helpers that create categorical axes that I think we could use within our own DQM analyzers. Maybe they don't get used immediately within the DQM in order to avoid breaking anything, but having them around for writing new categorical histograms may be helpful.

  std::tuple<std::size_t,double,double> category_bins(
      const std::vector<std::string>& categories,
      int offset = 0
  ) {
    std::size_t n_categories = categories.size();
    double min = offset - 0.5;
    double max = offset + n_categories + 1.5;
    return std::make_tuple(n_categories, min, max);
  }
  void label_axis(TAxis* axis, const std::vector<std::string>& categories) {
    for (std::size_t ibin{1}; ibin <= categories.size(); ibin++) {
      axis->SetBinLabel(ibin, categories[ibin-1].c_str());
    }
  }
  void create_categorical_histogram(
      const std::string& name,
      const std::vector<std::string>& categories
  ) {
    auto [nbin, xmin, xmax] = category_bins(categories);
    histograms_.create(name, "", nbin, xmin, xmax);
    auto h{histograms_.get(name)};
    label_axis(h->GetXaxis(), categories);
  }
  void create_categorical_histogram(
      const std::string& name,
      const std::vector<std::string>& categories,
      const std::string& ylabel, int nybin, double ymin, double ymax
  ) {
    auto [nxbin, xmin, xmax] = category_bins(categories);
    histograms_.create(name, "", nxbin, xmin, xmax, ylabel, nybin, ymin, ymax);
    label_axis(histograms_.get(name)->GetXaxis(), categories);
  }
  void create_categorical_histogram(
      const std::string& name,
      const std::vector<std::string>& xcategories,
      const std::vector<std::string>& ycategories
  ) {
    auto [nxbin, xmin, xmax] = category_bins(xcategories);
    auto [nybin, ymin, ymax] = category_bins(ycategories);
    histograms_.create(name, "", nxbin, xmin, xmax, "", nybin, ymin, ymax);
    label_axis(histograms_.get(name)->GetXaxis(), xcategories);
    label_axis(histograms_.get(name)->GetYaxis(), ycategories);
  }
tvami commented 1 month ago

processor's name as a prefix to the histogram name (I think) which is another source of confusion/annoyance.

I second that point.

The other two make sense too!