NWUHEP / ntupleProducer

Northwestern ntuplizer tools for use with CMSSW.
https://twiki.cern.ch/twiki/bin/view/CMS/UserCodeNWUntupleProducer
4 stars 4 forks source link

The Ntuple Producer code

This is a CMSSW code for creating small ROOT ntuples from CMS data/MC samples. Check out this twiki for more details: CMS/UserCodeNWUntupleProducer

Instructions for Users

Once compiled, we are ready to run it

Runnning the code

  cd NWU/ntupleProducer/test
  cmsRun ntupleProducer_cfg.py

it assumes you are running over an MC sample. If you want to run on data, do:

  cmsRun ntupleProducer_cfg.py isRealData=1

that will set up an appropriate global tag etc.

NB By defualt, the ntuples require that there be at least one muon(electron) with pT > 3(5) GeV in order for an event to be saved. In the case that this is not desired (for instance, in jet or photon based studies), you should switch off the skimLeptons option in ntupleProducer_cfg.py

In addition to this, there are various flags the configuration file, ntupleProducer_cfg.py, that allow to save/not save certain objects (muons, jets, etc). All are saved by default.

Running with CRAB

For running over individual datasets, it's best to use standard crab. The configuration files for MC and data are crabNtuples_MC.cfg and crabNtuples_Data.cfg. Submission goes as follows,

crab -create -cfg crabNtuples_<type>.cfg
crab -submit -c <ui_working_dir>

To check the status of your jobs,

crab -status -c <ui_working_dir>

and to get the log files,

crab -get -c <ui_working_dir>

More information can be found in the CMS SW guide chapter on CRAB.

For submission of ntuple production of multiple datasets at once, the multicrab framework can be used. It is described very briefly here. The important feature is that you can use most (?) of the standard crab commands for submission and checking on jobs status by replacing the crab command with multicrab. For instance when jobs are submitted in multicrab you can do the following,

multicrab -create -cfg <cfg_file> -submit

and to check their status

multicrab -status -c <ui_working_dir>

which also works for crab. You can also check the status of individual datasets using standard crab commands. The main difference is in the format of the configuration files. For multicrab, there is a crab.cfg file with a set of global configuration parameters and a multicrab.cfg file where each dataset is given its own specific configuration. As for the case of normal crab, two configuration files have been prepared for data and MC, multicrab_data.cfg and multicrab_mc.cfg.

Checking Output

After CRAB claims that your jobs are finished with exit codes 0 0, you will want to double check because it lies and large jobs tend to have a few extra or missing files.

Run the following command:

  ./find_goodfiles.py -c Path/To/CrabDir -q

This will check that all the jobs listed in the crab xml files are actually in your output area, and that your output area contains no extra or duplicate files. If it does, the script will tell you what needs to be rerun or what needs to be deleted.

Instructions for Developers

If the changes do not conflict, you are done. If there are conflicts, markers will be left in the problematic files showing the conflict; git diff will show this. Once you have edited the files to resolve the conflicts, git commit -a.

Tagging policy

At any time you can tag your code, and push your tags to remote:

  git tag -a test1 -m "my tag"
  git push origin --tags

You can use any tags you want, later those can be deleted.

For the global production though, we should stick with a tagging convention. Tags should be vX.Y and I am starting them with v6.1. Such that the tag corresponds to the nutuple_v6 name of ntuple production. If the new code significantly changes the format of the ntuples (substantial changes to class definitions etc.) then the first number of a tag should be incremented (to v7.1 etc.) and the ntuple production path-name should be changed correspondingly. Otherwise, incremental changes should be reflected in changes to the second digit.