dask-contrib / dask-awkward

Native Dask collection for awkward arrays, and the library to use it.
https://dask-awkward.readthedocs.io
BSD 3-Clause "New" or "Revised" License
60 stars 19 forks source link

Concatenation on `axis=1` with `ak.combinations` introduces overtouching #526

Open ikrommyd opened 1 month ago

ikrommyd commented 1 month ago

To reproduce:

import dask_awkward as dak
from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()

tnp = dak.combinations(events.Electron, 2, fields=["tag", "probe"])
pnt = dak.combinations(events.Electron, 2, fields=["probe", "tag"])
zcands = dak.concatenate([tnp, pnt], axis=1)
dak.necessary_columns(zcands.tag.pt)

will give this while it should only need nElectron and Electron_pt:

{'from-uproot-df174c21f639e55bab46ab7dd4a720b9': frozenset({'Electron_charge',
            'Electron_cleanmask',
            'Electron_convVeto',
            'Electron_cutBased',
            'Electron_cutBased_Fall17_V1',
            'Electron_cutBased_HEEP',
            'Electron_deltaEtaSC',
            'Electron_dr03EcalRecHitSumEt',
            'Electron_dr03HcalDepth1TowerSumEt',
            'Electron_dr03TkSumPt',
            'Electron_dr03TkSumPtHEEP',
            'Electron_dxy',
            'Electron_dxyErr',
            'Electron_dz',
            'Electron_dzErr',
            'Electron_eCorr',
            'Electron_eInvMinusPInv',
            'Electron_energyErr',
            'Electron_eta',
            'Electron_genPartFlav',
            'Electron_genPartIdx',
            'Electron_hoe',
            'Electron_ip3d',
            'Electron_isPFcand',
            'Electron_jetIdx',
            'Electron_jetPtRelv2',
            'Electron_jetRelIso',
            'Electron_lostHits',
            'Electron_mass',
            'Electron_miniPFRelIso_all',
            'Electron_miniPFRelIso_chg',
            'Electron_mvaFall17V1Iso',
            'Electron_mvaFall17V1Iso_WP80',
            'Electron_mvaFall17V1Iso_WP90',
            'Electron_mvaFall17V1Iso_WPL',
            'Electron_mvaFall17V1noIso',
            'Electron_mvaFall17V1noIso_WP80',
            'Electron_mvaFall17V1noIso_WP90',
            'Electron_mvaFall17V1noIso_WPL',
            'Electron_mvaFall17V2Iso',
            'Electron_mvaFall17V2Iso_WP80',
            'Electron_mvaFall17V2Iso_WP90',
            'Electron_mvaFall17V2Iso_WPL',
            'Electron_mvaFall17V2noIso',
            'Electron_mvaFall17V2noIso_WP80',
            'Electron_mvaFall17V2noIso_WP90',
            'Electron_mvaFall17V2noIso_WPL',
            'Electron_mvaTTH',
            'Electron_pdgId',
            'Electron_pfRelIso03_all',
            'Electron_pfRelIso03_chg',
            'Electron_phi',
            'Electron_photonIdx',
            'Electron_pt',
            'Electron_r9',
            'Electron_seedGain',
            'Electron_sieie',
            'Electron_sip3d',
            'Electron_tightCharge',
            'Electron_vidNestedWPBitmap',
            'Electron_vidNestedWPBitmapHEEP',
            'nElectron',
            'nGenPart',
            'nJet',
            'nPhoton'})}
martindurant commented 1 month ago

Are you saying that you need both combinations and concatenate to get this?

ikrommyd commented 1 month ago

You can do once and concatenate with itself like

In [1]: import dask_awkward as dak
   ...: from coffea.nanoevents import NanoEventsFactory
   ...:
   ...: events = NanoEventsFactory.from_root({"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"}).events()
   ...:
   ...: tnp = dak.combinations(events.Electron, 2, fields=["tag", "probe"])
/Users/iason/miniforge3/envs/egamma_dev/lib/python3.10/site-packages/coffea/nanoevents/methods/candidate.py:11: FutureWarning: In version 2024.7.0 (target date: 2024-06-30 11:59:59-05:00), this will be an error.
To raise these warnings as errors (and get stack traces to find out where they're called), run
    import warnings
    warnings.filterwarnings("error", module="coffea.*")
after the first `import coffea` or use `@pytest.mark.filterwarnings("error:::coffea.*")` in pytest.
Issue: coffea.nanoevents.methods.vector will be removed and replaced with scikit-hep vector. Nanoevents schemas internal to coffea will be migrated. Otherwise please consider using that package!.
  from coffea.nanoevents.methods import vector
/Users/iason/miniforge3/envs/egamma_dev/lib/python3.10/site-packages/coffea/nanoevents/schemas/nanoaod.py:243: RuntimeWarning: Missing cross-reference index for FatJet_genJetAK8Idx => GenJetAK8
  warnings.warn(

In [2]: zcands = dak.concatenate([tnp, tnp], axis=1)
   ...: dak.necessary_columns(zcands.tag.pt)

But yes, you do need to concatenate. tnp itself is fine

In [3]: dak.necessary_columns(tnp.tag.pt)
Out[3]:
{'from-uproot-b0c009586b4553e84b096eeaee2d1795': frozenset({'Electron_pt',
            'nElectron'})}