Open nsmith- opened 1 month ago
The output of the above example (with rich.print) is
New columns:
{
'Electron_genId': ColumnData(
form=ListOffsetForm('i64', IndexedOptionForm('i64', NumpyForm('int32'), parameters={'__doc__': 'PDG id'}), parameters={'__doc__': 'PDG id'}),
parent_columns=frozenset({'GenPart_pdgId', 'Electron_genPartIdx'}),
constructor=<function <lambda> at 0x1044937e0>
),
'Electron_genPt': ColumnData(
form=ListOffsetForm('i64', IndexedOptionForm('i64', NumpyForm('float32'), parameters={'__doc__': 'pt'}), parameters={'__doc__': 'pt'}),
parent_columns=frozenset({'GenPart_pt', 'Electron_genPartIdx'}),
constructor=<function <lambda> at 0x104493f60>
)
}
Necessary columns for flat_dptrel:
{'from-uproot-a24b37c9c0d42135bcbf2dd760ac48e3': frozenset({'GenPart_pt', 'GenPart_pdgId', 'Electron_pt', 'Electron_charge', 'Electron_genPartIdx'})}
[0.983, 0.998, 0.307, 0.98, 0.999, ..., 0.993, 0.363, 0.972, 0.997, 0.943]
This is a sketch of how to add new columns to a
uproot.dask
call in a lazy way. The example here is resolving cross-references, which we would rather do at typetracer time using the mixin class since otherwise we have to resolve all of them ahead of time, but it is just to demonstrate it can be done.In the long term, this is not the best solution, rather we would want to introduce a "non-touching zip" to dask-awkward. I had proposed
dak.bundle
as the verb, where users can zip dask-awkward arrays together without forcing all fields to be materialized by specifying exactly what broadcasting assumptions are expected to hold between the inputs.