Low memory XML decoding (parsing scans iteratively)

ISA-tools / mzml2isa

Parser to get meta information from mzML file and parse relevant information to a ISA-Tab structure

GNU General Public License v3.0

12 stars 6 forks source link

Low memory XML decoding (parsing scans iteratively) #23

Open lomereiter opened 7 years ago

lomereiter commented 7 years ago

This PR provides an alternative solution to #13: each scan is parsed once, all necessary information is extracted from it, then the node is freed. On a large imzML file this brought top memory consumption from 2.5GB down to 90MB, albeit the processing time increased from 6s to 13s.

althonos commented 7 years ago

This seems a lot cleaner than what we hacked through at first. I'll review it as soon as I can.

Tomnl commented 7 years ago

Thanks @lomereiter, this is a great contribution.

No unit tests yet... but it seems to be passing the travis and Appveyor tests with no problem

Tomnl commented 7 years ago

Hey @althonos, do you think we should merge this now? or perhaps we should wait until we have the unit test functionality?

althonos commented 7 years ago

Well, since this passes the integration against MetaboLights, I'm positive about merging (I'm not sure how long it will take to setup unit tests, the feat-tests may be far behind master).

althonos commented 7 years ago

Maybe (because of the increased time) we should still leave both methods and let the user choose (like lxml.etree.iterparse allows to give a huge_tree parameter).

Tomnl commented 7 years ago

Yeah I think you are right @althonos, keeping both methods seems like the best idea as memory consumption might not be a problem for some.