Closed m-fila closed 4 months ago
If I understand correctly this would effectively give an "opt-in" solution for generated EDMs, as the EDM has to explicitly load these pythonizations. For EDM4hep this would not really be a problem, as we have some python glue code already in any case.
For me this looks like a good solution, but we should add a bit of documentation on how to use this in generated EDMs.
If I understand correctly this would effectively give an "opt-in" solution for generated EDMs, as the EDM has to explicitly load these pythonizations. For EDM4hep this would not really be a problem, as we have some python glue code already in any case.
yes, it's opt-in.
load_pythonizations
could be added to EDM4hep's__init__.py
to load on importFor me this looks like a good solution, but we should add a bit of documentation on how to use this in generated EDMs.
Thanks, I will add it
Note that this will make indexing significantly slower: With this PR and the one in EDM4hep:
In [2]: import edm4hep
In [3]: coll = edm4hep.MCParticleCollection()
In [4]: coll.create()
Out[4]: <cppyy.gbl.edm4hep.MutableMCParticle object at 0x55f984f08120>
In [5]: %timeit coll[0]
3.45 µs ± 72.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
and without this PR:
2.46 µs ± 26.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
It should be said somewhere that the current behaviour is you get an object when indexing out of bounds and then it crashes when you try to do something with that object. But I think the case of indexing out of bounds should be rare since typically you run either a for loop without indexes or if it has indexes then it's bounded by len(coll)
.
Note that this will make indexing significantly slower
Thanks for pointing this out, I'll check if this can be improved by mapping __getitem__
to at
and raising IndexError
I tried benchmarking on following function:
def bench():
try:
coll[0]
except:
pass
coll = MCParticleCollection()
bench() # give cppyy time to load lazily
%timeit bench()
coll.create()
bench() # give cppyy time to load lazily
%timeit bench()
Bound check | __getitem__ implementation |
Average time, no-exception branch | Average time, exception branch | |
---|---|---|---|---|
1. | ❌ | Default cppyy mapping to operator[] |
683 ns | - |
2. | ✔️ | Originally proposed in this PR, bound-check in python | 1.2 µs | 779 ns |
3. | ✔️ | Mapped to at |
730 ns | 30.8 µs |
4. | ✔️ | at wrapped to raise IndexError |
826 ns | 32.1 µs |
The implementations using at
seems to be more performant in the usual case. Would it be acceptable to go with option 3. or 4.?
A C++ callback added to the Collection template could potentially achieve the same as at
(option 3.) on the no-exception branch and slightly better than python side bound-check (option 2.) on the exception branch. But then extending with other c++ callbacks would be very ugly
The performance of at
based implementations is poor if there is an exception, because by design cppyy tries to evaluate all eligible overloads if there was a throw (in this caseat(size_t)
and at(size_t) const
). Since cppyy-1.7.0
it's possible to explicitly select between const and non-const overload, but it's newer than what we have
I see there has been quite some discussion (and progress?) here. A few comments / questions from my side in no particular order:
Did you by any chance check how the times compare to pure c++? Mainly for comparison, I don't think we learn too much from them other than a 30 % increase on the python side is pretty negligible compared to pure c++.
Pure C++ google benchmark between a single element access with operator[]
vs at
:
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
squareBracketsNoError 4.59 ns 4.59 ns 152241304
atNoError 5.26 ns 5.25 ns 131108856
atError 1552 ns 1550 ns 452227
where
static void squareBracketsNoError(benchmark::State& state) {
edm4hep::MCParticleCollection coll;
coll.create();
for (auto _ : state) {
try {
benchmark::DoNotOptimize(coll[0]);
benchmark::ClobberMemory();
} catch (const std::out_of_range&) {
}
}
}
static void atNoError(benchmark::State& state) {
edm4hep::MCParticleCollection coll;
coll.create();
for (auto _ : state) {
try {
benchmark::DoNotOptimize(coll.at(0));
benchmark::ClobberMemory();
} catch (const std::out_of_range&) {
}
}
}
static void atError(benchmark::State& state) {
edm4hep::MCParticleCollection coll;
coll.create();
for (auto _ : state) {
try {
benchmark::DoNotOptimize(coll.at(1));
benchmark::ClobberMemory();
} catch (const std::out_of_range&) {
}
}
}
both this and python benchmark are isolated single access, looping over the collection will probably give different results
IIUC all of the presented methods are still possible with python callbacks, right?
Yes, all the alternatives from that table were python callbacks
I think the exceptional case should not be the (main) driver for performance considerations. After all you are dealing with an error at that point and not with something "usual".
In the last commit I changed the implementation proposed in PR to option 4 from the table. It seems to me as a reasonable compromise
Can you rebase this onto the latest master to pick up the RNTuple API fixes? That should also make CI pass again.
Sorry for the delay. Rebased and added documentation. I also split the callback into two functions:
I think previously it was not clear that callback should start with some conditional statement to select what should be modified
Looks like all the necessary fixes have landed in upstream root (master and v6.32). IIRC this can be merged now, right?
Looks like all the necessary fixes have landed in upstream root (master and v6.32). IIRC this can be merged now, right?
Awesome 😎 Yes, it's ready to go
and off we go...
BEGINRELEASENOTES
ENDRELEASENOTES
458 and key4hep/EDM4hep#288 show there is some interest in tuning the cppyy generated bindings. The cppyy has a feature for this called pythonization -> a callback can be registered for all the classes that fulfill some predicate (e.g. derived from
podio::CollectionBase
).This PR contains:
podio.pythonizations.utils.Pythonizer
)__getitem__
for classes derived frompodio::CollectionBase
The mechanism of loading the pythonizations is inspired by ROOT pythonizations with a change that de order of loading of the pythonizations is defined.
To load the pythonizations in a downstream projects, for instance for project utilizing a "edm4hep" namespace
alternatively a selection of pythonizations can be loaded by importing and applying them one by one
The alternative mechanism I tough about:
Python::Module
) and knowing the Python C API is required