QuarkNet-HEP / cima-wzh

WZH version of CMS Instrument for Masterclass Analysis (CIMA) (2019)
0 stars 2 forks source link

Standardize vocabulary #7

Open JG-QuarkNet opened 3 years ago

JG-QuarkNet commented 3 years ago

In 2019, the CMS data used by iSpy was reorganized from a "flat" 10,000 events into a set of "tranches" of data tailored to student groups of different sizes.

The alterations Joel made to CIMA to accommodate this did not keep a consistent vocabulary of labels and variable names in reference to this new data organization scheme. In addition, the original program contained several unclear or misleading variable names.

Labels and variable names should be standardized. This was Joel's working template for syntax at the time he made the changes:

datablock: 5,10,25,50,100 dataset: 5.1, 10.6, etc. dataset id: dataset -> [1,190] dataset number: 1, ..., 100 dataset index: 5.1-4, 10.6-55, etc. unique id: (int)[(string)(dataset id) + (string,3)(dataset number)] (replaces "flat" event_id: [1,10000])

He did not consistently apply these, though, and we've fallen into different usages since this change. Tom suggested "group" instead of "datablock", but "group" was already used by CIMA's previous data system and Joel wanted to avoid confusion. Ken has also started using "data file" for what's labeled here as "dataset," which seems fine.