Search different ways that might increase parsing speed. Parsing is done right now by the pandas.read_xml method here. Several alternatives are:
polars, duckdb, pyspark might have xml parsers and might be faster
use plain xml parsing from python (without pandas)
...
Writing to sqlite database right now is done by pandas.to_sqlhere. There might be other faster methods depending on step 1.
Construct a benchmark in an own repository. Use a benchmark xml file from the Marktstammdatenregister and test different implementations for parsing them.
Decide for a best method and implement it in open-mastr
This task contains several steps:
Search different ways that might increase parsing speed. Parsing is done right now by the
pandas.read_xml
method here. Several alternatives are:polars
,duckdb
,pyspark
might have xml parsers and might be fasterWriting to sqlite database right now is done by
pandas.to_sql
here. There might be other faster methods depending on step 1.Construct a benchmark in an own repository. Use a benchmark xml file from the Marktstammdatenregister and test different implementations for parsing them.
Decide for a best method and implement it in
open-mastr