gnewton / chidley

Convert XML to Go structs / XML to JSON
Apache License 2.0
274 stars 35 forks source link

Make SourceReaders channel instead of array: large number of input files causes unnecessary memory usage #29

Closed gnewton closed 6 years ago

gnewton commented 6 years ago

When there are a large number of input XMl files (like 929 pubmed xml files) opening all the readers at the beginning wastes a lot of memory, especially when these readers are only used one at a time. In the pubmed example, memory usage spikes to 10GB at start right after all the bufios are created. Make it so only one is created at a time to reduce this memory load.

gnewton commented 6 years ago

Done. https://github.com/gnewton/chidley/commit/79bc5378a423ac65c2cb9639505c1ca713af74b1

Now the 10GB memory usage on startup for 929 pubmed files is down to ~750MB. Nice!