data61 / GS1Combinators

A library to parse the GS1 Events into Haskell data types
Apache License 2.0
5 stars 1 forks source link

Cursor API loads the entire XML document into RAM #38

Open sajidanower23 opened 5 years ago

sajidanower23 commented 5 years ago

For the purposes of parsing XML, the Cursor API of the package xml-conduit is being used.

Cursor parses the entire document in one go and loads it all into memory, which means that large documents (we are talking Gigabytes) might cause a problem. In the case of an Operating System error in production environment, we may have to move toward using a streaming API (which loads and parses the document line by line).

It is worth refactoring most of the code in the Parser module so that making that shift is easier.

Original Author: ano002

(Moved with github-migration-0.1.0.0 (package github-migration-0.1.0.0 revision df9f38b))

axman6 commented 5 years ago

I can't remember which libraries do this, but it might be possible to use a SAX style parser which streams the data in - a little more difficult to work with perhaps but probably not too much.

Original Author: mas17k

sarafalamaki commented 5 years ago

I doubt we'd ever have to deal with data that big. Events are tiny, and our system is event based. If we have to parse millions of them from one XML file for some reason, then it might be an issue. An easier fix would be to just split the file up in that case.

Original Author: fal05c