go-hep / hep

hep is the mono repository holding all of go-hep.org/x/hep packages and tools
https://go-hep.org
BSD 3-Clause "New" or "Revised" License
231 stars 36 forks source link

groot: problem with slices in large file #821

Closed rmadar closed 3 years ago

rmadar commented 4 years ago

Following a discussion in the ROOT forum, it turns out that reading back slices fails when the TTree is large.

sbinet commented 4 years ago

I am trying to reproduce the failure with the following program:

package main

import (
    "log"
    "math/rand"

    "go-hep.org/x/hep/groot"
    "go-hep.org/x/hep/groot/riofs"
    "go-hep.org/x/hep/groot/rtree"
)

func main() {
    f, err := groot.Create("o.root", riofs.WithoutCompression())
    if err != nil {
        log.Fatalf("error: %+v", err)
    }
    defer f.Close()

    var evt struct {
        Run    int64     `groot:"runNbr"`
        Evt    int32     `groot:"evtNbr"`
        NLep0  int32     `groot:"nlep0"`
        LepPt0 []float64 `groot:"lep_pt0[nlep0]"`
        NLep1  int32     `groot:"nlep1"`
        LepPt1 []float64 `groot:"lep_pt1[nlep1]"`
        NLep2  int32     `groot:"nlep2"`
        LepPt2 []float64 `groot:"lep_pt2[nlep2]"`
        NLep3  int32     `groot:"nlep3"`
        LepPt3 []float64 `groot:"lep_pt3[nlep3]"`
        NLep4  int32     `groot:"nlep4"`
        LepPt4 []float64 `groot:"lep_pt4[nlep4]"`
    }
    evt.LepPt0 = make([]float64, 0, 20)
    evt.LepPt1 = make([]float64, 0, 20)
    evt.LepPt2 = make([]float64, 0, 20)
    evt.LepPt3 = make([]float64, 0, 20)
    evt.LepPt4 = make([]float64, 0, 20)

    wvars := rtree.WriteVarsFromStruct(&evt)
    w, err := rtree.NewWriter(f, "truth", wvars)
    if err != nil {
        log.Fatalf("error: %+v", err)
    }
    defer w.Close()

    rnd := rand.New(rand.NewSource(1234))

    const N = 1e7
    for i := 0; i < N; i++ {
        if i%(N/100) == 0 {
            log.Printf("evt: %d...", i)
        }
        evt.Run = int64(i)
        evt.Evt = int32(i)
        evt.NLep0 = int32(rnd.Intn(10) + 1)
        evt.NLep1 = evt.NLep0
        evt.NLep2 = evt.NLep0
        evt.NLep3 = evt.NLep0
        evt.NLep4 = evt.NLep0
        evt.LepPt0 = evt.LepPt0[:0]
        evt.LepPt1 = evt.LepPt1[:0]
        evt.LepPt2 = evt.LepPt2[:0]
        evt.LepPt3 = evt.LepPt3[:0]
        evt.LepPt4 = evt.LepPt4[:0]
        for j := 0; j < int(evt.NLep0); j++ {

            evt.LepPt0 = append(evt.LepPt0, 5*rnd.NormFloat64()+30)
            evt.LepPt1 = append(evt.LepPt1, 5*rnd.NormFloat64()+30)
            evt.LepPt2 = append(evt.LepPt2, 5*rnd.NormFloat64()+30)
            evt.LepPt3 = append(evt.LepPt3, 5*rnd.NormFloat64()+30)
            evt.LepPt4 = append(evt.LepPt4, 5*rnd.NormFloat64()+30)
        }

        _, err = w.Write()
        if err != nil {
            log.Fatalf("could not write evt=%d: %+v", i, err)
        }
    }

    err = w.Close()
    if err != nil {
        log.Fatalf("error: %+v", err)
    }

    err = f.Close()
    if err != nil {
        log.Fatalf("error: %+v", err)
    }
}

to no avail, eventhough:

$> ls ./o.root
-rw-r--r-- 1 binet binet 2.6G Nov 17 11:46 o.root

do you have (or could you share) the program that generated the truth_ttbar.root file around ?

rmadar commented 4 years ago

Yes, it's here (sorry in advance for the mess, I did try to keep it clean ...): https://gitlab.cern.ch/rmadar/general-spincorr-studies/-/tree/master/ana-go

To produce the tree, you will have to do:

  1. cd ana-run and go build
  2. run the command
    ./run-ana -itree truth -ofile ttbar13.root -normJSON ../spl-norm/Normalization.json -type ATtruth -flist /AtlasDisk/user/madar/data/spincorr/ntuples/v01/filelists/split/ttbar13.list

where you should directly have access to /AtlasDisk/user/madar/data/spincorr/ntuples/v01/filelists/split/ttbar13.list and the file listed in it (let me know if that's not the case).

Let me know if you run into troubles ...

sbinet commented 4 years ago

ok, I managed to reproduce with a simpler test-case. it happens with, e.g.:

var evt struct {
    Evt int32     `groot:"evtNbr"`
    N   int32     `groot:"n"`
    Pts []float32 `groot:"pt[n]"`
}

and with N==0 throughout. (and when one goes over the 2Gb limit).

rmadar commented 4 years ago

Haaa indeed, this is exactly what happens in my case since I keep all branches to process both reco and truth events (and for the truth events, reco branches are empty, indeed). I admit it's not really common (or smart?) to do such a thing ...

sbinet commented 4 years ago

common or smart, it's something that groot ought to at least not explode on :)

sbinet commented 4 years ago

could you give https://github.com/go-hep/hep/pull/822 a try?

(worked for me (TM))

rmadar commented 4 years ago

OK I'll try it now!

rmadar commented 4 years ago

OK I don't manage checking out your branch ... I used my previous notes but it fails. I might wait the merge to master ... Or I might try tomorrow at the end of the day with your branch (with a bit more time)

sbinet commented 4 years ago

done.

as usual, feel free to reopen if that failed to fix it.

rmadar commented 4 years ago

Hum, I have just tried and it seems I still see the issue. More precisely, I have updated my go-hep version to go-hep.org/x/hep v0.28.3-0.20201117153122-f56aeeb706e3, re-built, and re-run. But the following code:

TFile *f = TFile::Open("/AtlasDisk/user/madar/data/spincorr/ntuples/v01/ana_v0_truth/ttbar13bis.root");
TTree *t = (TTree*) f->Get("truth");
t->BuildIndex("runNumber", "eventNumber");

crashes, exactly as before. I am pretty sure I did the update properly because the size of ttbar13bis.root is now 3.5 Go while it was 3.4 Go before.

Could you try yourself and re-open the issue if you it doesn't work for you too? If it works, it seems I did something wrong while I re-ran.

sbinet commented 4 years ago

hum... reading with this program:

#include "TTree.h"
#include "TFile.h"

#include <iostream>

void issue_42318() {
    auto f = TFile::Open("./ttbar13bis.root");
    auto t = (TTree*)f->Get("truth");

    Long64_t evtNbr = 0;
    Int_t  runNbr = 0;

    Int_t d_nlep = 0;
    float *d_lep_pt = nullptr;

    Int_t d_njet = 0;
    float *d_jet_pt = nullptr;

    double truth_kvec[3];

    t->SetBranchAddress("runNumber", &runNbr);
    t->SetBranchAddress("eventNumber", &evtNbr);

    t->SetBranchAddress("d_nlep", &d_nlep);
    t->SetBranchAddress("d_lep_pt", &d_lep_pt);

    t->SetBranchAddress("d_njet", &d_njet);
    t->SetBranchAddress("d_jet_pt", &d_jet_pt);

    t->SetBranchAddress("truth_kvec", &truth_kvec);

    std::cout << "entries: " << t->GetEntries() << "\n";
    for (int i = 0; i < t->GetEntries(); i++) {
        t->GetEntry(i);
        std::cout << "entry: " << i 
                << " run=" << runNbr 
                << ", evt=" << evtNbr
                << ", nlep=" << d_nlep
                << ", njet=" << d_njet
                << ", truth_kvec=[" << truth_kvec[0] <<", " << truth_kvec[1] <<", " << truth_kvec[2] << "]"
                << "\n";
    }
}

I get:

$> root -b -q ./issue-42318.C
[...]
entry: 7647997 run=310000, evt=649757898, nlep=0, njet=0, truth_kvec=[-0.398049, -0.653809, 0.643498]
entry: 7647998 run=310000, evt=649822391, nlep=0, njet=0, truth_kvec=[-0.132782, 0.132049, 0.98231]
entry: 7647999 run=310000, evt=649822698, nlep=0, njet=0, truth_kvec=[-0.573527, 0.0362466, 0.818384]
Warning in <TBasket::ReadBasketBuffers>: basket:d_lep_pt has fNevBuf=5917704 but fEntryOffset=0, pos=2701564065, len=128076, fNbytes=231, fObjlen=128008, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket:d_lep_e has fNevBuf=5917704 but fEntryOffset=0, pos=2701564296, len=128075, fNbytes=230, fObjlen=128008, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket:d_lep_eta has fNevBuf=5917704 but fEntryOffset=0, pos=2701564526, len=128077, fNbytes=232, fObjlen=128008, trying to repair
rmadar commented 3 years ago

Just to make sure: you don't expect other test or debug from my side, right?

sbinet commented 3 years ago

nope. I've found a couple of bugs related to the >2Gb files handling. still no cigar, though.