CastXML / pygccxml

pygccxml is a specialized XML reader that reads the output from CastXML or GCCXML. It provides a simple framework to navigate C++ declarations, using Python classes.
Boost Software License 1.0
129 stars 44 forks source link

Parsing can be 20x slower w/ pygccxml vs. in-memory solutions? (e.g. clang.cindex) #129

Open EricCousineau-TRI opened 3 years ago

EricCousineau-TRI commented 3 years ago

This might be closable as "Not a Problem", but figgered I'd post it here anyway.

WARNING: These benchmarks are still relatively shallow. More work would be necessary to draw meaningful conclusions for more general usage / scalability.

New Setup: pygccxml vs. clang.cindex

Tinkering more, if I turn this towards a more complex project, like CastXML itself, and I want to see the CastXML symbols itself, it takes about ~70s to load a parsed file (from scratch) for pygccxml, vs. ~3.5s for clang.cindex.

Example: https://github.com/EricCousineau-TRI/repro/blob/3c2fbae3cb0afd623a2d7909e3f77f14fd67da52/python/bindings/pygccxml_sandbox/test_castxml_scan.ipynb Uses:

Speculations for newer setup:


Old Setup: pygccxml vs. cppyy

With some simple code like this:

#include <vector>

#include <Eigen/Dense>

namespace ns {

template <typename T, typename U = int>
class ExampleClass {
public:
    std::vector<T> make_std_vector() const;
    Eigen::Matrix<U, 3, 3> make_matrix3();
};

// Analyze concrete instantiations of the given class.
extern template class ExampleClass<int>;
extern template class ExampleClass<float, float>;

}  // namespace ns

It takes about 0.60s on my machine for cppyy to parse this and allow me to print out a namespace object, whereas pygccxml (with castxml == 0.3.4) takes about 4.3s. (This is across 10 trials, only timing the parsing + retrieval routine)

Will post benchmark shortly.

Speculations:

EricCousineau-TRI commented 3 years ago

Per FAQ here: https://pygccxml.readthedocs.io/en/develop/faq.html#performance And discussion here: #56

If I add start_with_declarations=["ns"] to the config, then the time is reduced from 4.3s to 1.08s, so it's still about 2x slower, but much better than the aforementioned 7x slowdown.

EricCousineau-TRI commented 3 years ago

From cppyy docs, it looks like it does do lazy loading, and I'm not sure how to query all symbols. For now, will just manually instantiate.

EricCousineau-TRI commented 3 years ago

Added benchmark file: https://github.com/EricCousineau-TRI/repro/blob/c7cdaa3a4eb9f5661ca58f5c95b3519146cd04f6/python/bindings/pygccxml_sandbox/compare_pygccxml_cppyy.py

Latest run:

clang.cindex
Mean Time: 0.6574852466583252

cppyy
Mean Time: 0.5956002950668335

pygccxml
Mean Time: 0.976616621017456

Closing for now, as I think 2x is fine for now given the differences.

EricCousineau-TRI commented 3 years ago

Re-opened and rescoped for comparing against clang.cindex for a more complex problem - analyzing CastXML itself.