CoffeaTeam / coffea-casa

Repository with configuration setup of a prototype of analysis facility - "coffea-casa"
BSD 3-Clause "New" or "Revised" License
17 stars 19 forks source link

Running user notebook (RDF + correctionlib) breaks with error `gzip-compressed JSON files are only supported if ZLIB is found when the package is built` #401

Closed oshadura closed 9 months ago

oshadura commented 9 months ago

Trying user notebook https://github.com/mariadalfonso/Hrare/blob/main/analysis/testDistCluster.ipynb at coffea-casa with ROOT and correctionlib installed from conda, notebook breaks with next traceback:

--------------------------------------------------------------------------
runtime_error                             Traceback (most recent call last)
Cell In[1], line 92
     90 ROOT.ROOT.EnableImplicitMT()
     91 RDataFrame = ROOT.RDataFrame
---> 92 myinit()
     94 dfINI = RDataFrame("Events", files)
     95 sumW = 1. # temporarily set to 1.

Cell In[1], line 19, in myinit()
     17 def myinit():
     18     loadUserCode()
---> 19     loadCorrectionSet(2018)

File ~/Hrare/analysis/utilsAna.py:30, in loadCorrectionSet(year)
     27     ROOT.gInterpreter.Declare('#include "config/sfCorrLib.h"')
     28 #    ROOT.gInterpreter.Declare('#include "config/mysf.h"')
     29 #    ROOT.gInterpreter.Load("config/mysf.so")
---> 30     ROOT.gInterpreter.ProcessLine('auto corr_sf = MyCorrections(%d);' % (year))
     31     ROOT.gInterpreter.Declare('''
     32         #ifndef MYFUN
     33         #define MYFUN
   (...)
     41         '''
     42     )

runtime_error: long TInterpreter::ProcessLine(const char* line, TInterpreter::EErrorCode* error = nullptr) =>
    runtime_error: Gzip-compressed JSON files are only supported if ZLIB is found when the package is built

Dependencies:

cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~/Hrare$ conda list correctionlib
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
correctionlib             2.3.3            py39h7633fee_1    conda-forge
cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~/Hrare$ conda list root
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
root                      6.30.2           py39hddac248_1    conda-forge
cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~/Hrare$ conda list python
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
python                    3.9.18          h0755675_0_cpython    conda-forge
kondratyevd commented 9 months ago

Hi @oshadura,

I ran into exactly the same error trying to run this notebook at Purdue Analysis Facility. I have python=3.10.10, root=6.28.0, correctionlib=2.3.3.

oshadura commented 9 months ago

@kondratyevd I believe there is some issue with correctionlib from conda together with PyROOT (ROOT from conda). @nsmith- maybe you saw before anything like this?

nsmith- commented 9 months ago

I'm having trouble reproducing the issue, if I install

mamba create -n clibgzip python=3.10.10 root=6.28.0 correctionlib=2.3.3

I see that zlib is installed

  + zlib                                    1.2.13  h8a1eda9_5             conda-forge/osx-64     Cached

and a quick test correction summary mycorrections.json.gz seems to work.

But indeed zlib isn't a dependency of correctionlib, though it should be:

$ mamba repoquery depends correctionlib
...
Executing the query correctionlib

 Name          Version Build              Channel
──────────────────────────────────────────────────────
 correctionlib 2.3.3   py310h688a63d_1    conda-forge
 libcxx        15.0.7  h71dddab_0         conda-forge
 numpy         1.26.3  py310h4bfa8fc_0    conda-forge
 packaging     23.2    pyhd8ed1ab_0       conda-forge
 pydantic      1.10.13 py310hb372a2b_1    conda-forge
 python        3.10.10 he7542f4_0_cpython conda-forge
 python_abi    3.10    4_cp310            conda-forge
 rich          13.7.0  pyhd8ed1ab_0       conda-forge

I've made https://github.com/conda-forge/correctionlib-feedstock/pull/19 to see if this fixes the issue

nsmith- commented 9 months ago

Ok I realized the python bindings secretly unzip on the python side. So a more thorough test would be to make a test program tmp.cxx:

#include <iostream>
#include <correction.h>

int main() {
    auto cset = correction::CorrectionSet::from_file("binder/mycorrections.json.gz");
    std::cout << cset->description() << std::endl;
    return 0;
}

and then run

$CXX $(correction config --cflags --ldflags --rpath) tmp.cxx -o tmp
./tmp

(the --rpath is only for an issue on OS X) For me this still works, but maybe in your environment it doesn't?

oshadura commented 9 months ago

Hi @nsmith- , that's what I see at coffea-casa:

cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ g++ $(correction config --cflags --ldflags --rpath) tmp.cxx -o tmp
cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ ./tmp 
terminate called after throwing an instance of 'std::runtime_error'
  what():  Gzip-compressed JSON files are only supported if ZLIB is found when the package is built
Aborted (core dumped)
cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ correction config --cflags --ldflags --rpath
-std=c++17 -I/opt/conda/lib/python3.9/site-packages/correctionlib/include -L/opt/conda/lib/python3.9/site-packages/correctionlib/lib -lcorrectionlib -Wl,-rpath,/opt/conda/lib/python3.9/site-packages/correctionlib/lib
cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ cat tmp.cxx 
#include <iostream>
#include <correction.h>

int main() {
        auto cset = correction::CorrectionSet::from_file("/cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/BTV/2017_UL/btagging.json.gz");
        std::cout << cset->description() << std::endl;
        return 0;
}
oshadura commented 9 months ago

To confirm looks like your fix fixed issue (I should reinstall correctionlib):

cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ g++ $(correction config --cflags --ldflags --rpath) tmp.cxx -o tmp
cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ ./tmp 
This json file contains the corrections for deepJet and deepCSV AK4 taggers. Corrections are supplied for b-tag discriminator shape corrections (shape) and working point corrections (comb/mujets/incl). For the working point corrections the SFs in 'mujets' and 'comb' are for b/c jets. The 'mujets' SFs contain only corrections derived in QCD-enriched regions. The 'comb' SFs contain corrections derived in QCD and ttbar-enriched regions. Hence, 'comb' SFs can be used everywhere, except for ttbar-dileptonic enriched analysis regions. For the ttbar-dileptonic regions the 'mujets' SFs should be used. The 'incl' correction is for light-flavoured jets.
nsmith- commented 9 months ago

Thanks for confirming (I dind't realize I auto-closed this)