The following is a to-do list of work that need be done for the pre-processing and histogramming codes to interface nicely and provide a consistent UX:
For sklimming:
[ ] Sklimming should replace the Branch objects with Variable or Observable.
[ ] The idea of Functor sub-class called Builder for a Variable is much neater than the new_branches dictionary.
[ ] Remove the arg_types argument for Variable/Branch objects and replace with an optional list_str_arg where the default assumption when a str is provided in argument becomes that it is a another Branch
[ ] Sklimming back-end is too messy -- especially the organisation of different type of branches (new, tmp, on, off)
[ ] Dask distribution for the reading in the backend in a similar way to the histogramming code.
[ ] Consistent way of outputting yields using a given branch
[ ] Config to use a schema in a similar way to histogramming
[ ] reader code should be a processor code
[ ] Argparser to select branches and samples in stirring script
[ ] Look into a coffea backend
[ ] Make variables/branches optional in Sample constructor -- in case it is needed at only histogramming
[ ] Remove or actually use job name in general settings
[ ] indirs in general_settings to become used as default for samples if no where is given
[ ] Branch/Variable awate of tree rather than using a dict? this way user can just specify which trees to use when
looking for this variable withour repition
[ ] Better multiple-tree support
For histogramming:
[ ] Provide an easy way for user to change sample settings -- does user really need anythig other than to set regex (i.e tag and specify if a sample isdata)?
[ ] The input_paths finder in InputManager should be able to handle user provided methods of finding paths, possibly via a decorator in the config? This is for users who have NTuples that they want to Histogram without running through pre-prcoessing
[ ] Data rendering before passing awkward arrays to boost_histograms. This probably means handling masked awkward arrays, since it seems like boost_histograms do not deal well with None in the awkward/numpy arrays (they get dumped in the underflow) boost-histogram issue
[ ] Regions should not be required -- only Observables should be (inclusive sample can be selected by a dummy function)
[ ] Overall Systematics
[ ] Observable.fromFunc() does it really need args -- can we not just replace args with var
[ ] access to samples if specified sklimconfig in settings, or no?
[ ] Auto binning (uniform betwen min and max if user not provided binning -- safeguard for user? warn them? flag to tell us its acceptable not to have binning?)
[ ] Do we need general.from_hists flag to allow retrieving histograms from Histogram files?
[] Histo name goes to :: or __ between different XP components
For both sklimming and histogramming:
[ ] Remote distribution of jobs with dask(e.g. HTCondotCluster)
[ ] Can Functor fromStr as it is parse slicing syntax?
[ ] Test functions for all features
[ ] Variable and Observable can maybe inherit from a parent class -- what is the benefit a user will get if they have to specify the binning for var by var? except if we support no binning and just do a uniform binning on behalf of user, then yhey can just import their variables from sklimming to histogramming.
The following is a to-do list of work that need be done for the pre-processing and histogramming codes to interface nicely and provide a consistent UX:
For sklimming:
Branch
objects withVariable
orObservable
.Functor
sub-class calledBuilder
for aVariable
is much neater than thenew_branches
dictionary.arg_types
argument forVariable
/Branch
objects and replace with an optionallist_str_arg
where the default assumption when astr
is provided in argument becomes that it is a anotherBranch
reader
code should be a processor codebranches
andsamples
in stirring scriptcoffea
backendSample
constructor -- in case it is needed at only histogrammingwhere
is givenFor histogramming:
tag
and specify if a sampleisdata
)?input_paths
finder inInputManager
should be able to handle user provided methods of finding paths, possibly via a decorator in the config? This is for users who have NTuples that they want to Histogram without running through pre-prcoessingawkward
arrays toboost_histograms
. This probably means handling masked awkward arrays, since it seems likeboost_histograms
do not deal well withNone
in theawkward
/numpy
arrays (they get dumped in the underflow) boost-histogram issueargs
-- can we not just replaceargs
withvar
dask
(e.g.HTCondotCluster
)