ePADD / epadd

ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
https://www.epaddproject.org
112 stars 24 forks source link

Batch size when importing Mbox files – avoiding out of memory exceptions #406

Closed jfarwer closed 3 years ago

jfarwer commented 3 years ago

We have issues with ePADD running out of memory when importing large Mbox files (a few GB) (We allocate 11GB to the application by using java –Xmx11g -jar epadd-standalone.jar). Mbox files are currently read in batches of 10,000 messages. Reducing this number to 100 resolves the problem for all Mbox files in our collection (up to 15GB for each file) without a noticeable performance impact. Would reducing the batch size in an ePADD release be an option?