echemdb / svgdigitizer

(x,y) Data Points from SVG files
https://echemdb.github.io/svgdigitizer/
GNU General Public License v3.0
16 stars 8 forks source link

Improve performance of the svgdigitizer for files with embedded figures #19

Open DunklesArchipel opened 3 years ago

DunklesArchipel commented 3 years ago

The processing time of svg file with embedded figures is extremely high. Maybe there is a workaround to minimize the processing speed.

DunklesArchipel commented 3 years ago

The bottleneck is minidom.parse(self.filename)

saraedum commented 3 years ago

I think minidom is just not a very fast parser. Parsing the embedded image should be trivial as it only has to do base64 conversion I think. But understandably, performance was maybe not the main concern in that implementation.

http://elektito.com/2017/08/25/benchmarking-python-xml-parsers/

This is a bit old but lxml might be a good option that we could look into.