Closed RokasUrbonasQC closed 2 years ago
Is adding/vendor the ~300 files from the C++ parser really the only option?
Is adding/vendor the ~300 files from the C++ parser really the only option?
Yes, I think so. ANTLR Python target is slow, and our grammar definition is massive.
Is adding/vendor the ~300 files from the C++ parser really the only option?
Yes, I think so. ANTLR Python target is slow, and our grammar definition is massive.
I'm not saying that using a C++-based parser is the bad part, just vendoring it into the repository.
Would it be possible to use git submodules here? This would require that the C++ parser source code is also a git repository somewhere.
Would it be possible to use git submodules here? This would require that the C++ parser source code is also a git repository somewhere.
I could create a separate repo that would hold just the C++ runtime and parser files. Then we could use it as a submodule in pytsql
. Does this approach sound good?
Alternatively, it might be possible to have all the C++ and speedy-antlr-tool
code as a separate Python and Conda package, that would be a dependency of pytsql
. (I might even be able to make the C++ version an "extra" for the package (e.g. pip install pytsql[cpp-runtime]
)
Would it be possible to use git submodules here? This would require that the C++ parser source code is also a git repository somewhere.
I could create a separate repo that would hold just the C++ runtime and parser files. Then we could use it as a submodule in
pytsql
. Does this approach sound good?Alternatively, it might be possible to have all the C++ and
speedy-antlr-tool
code as a separate Python and Conda package, that would be a dependency ofpytsql
. (I might even be able to make the C++ version an "extra" for the package (e.g.pip install pytsql[cpp-runtime]
)
After discussion with @ivergara via Slack, I have decided to leave the C++ files as they are right now, as other approaches are too difficult to implement or maintain, while providing little benefit.
Thanks a bunch for your work - in particular for extending the documentation and writing tests for explicitly comparing our Python and C++ parsing!
Do you have thoughts on what ought to happen with the Python parser? Are you aware of cases in which in might still prove useful?
The Python parser code just has to stay. It's still used in converting the C++ parse tree to the Python equivalent. I don't think there will be any other use for it.
I had an idea of using the Python parser code as a fallback in case no binary distribution exists for some specific OS + Python version combination. However, such cases are unlikely to occur with proper cibuildwheels
and conda
distribution workflows. I discussed this fallback idea briefly with @ivergara, and we decided that it would be too much effort to maintain.
Adding C++ ANTLR code with bindings created using speedy-antlr-tool to significantly improve performance on large inputs.
Included changes:
speedy-antlr-tool
with simpler equivalent rulesbuild_and_publish
workflowNote for reviewers: All
.cpp
and.h
files are autogenerated or directly taken from ATNLR repo.