DataConservancy / dcs-packaging-tool

The Data Conservancy Packaging Tool
http://dataconservancy.github.io/dcs-packaging-tool
2 stars 3 forks source link

java memory error #25

Open jschell42 opened 7 years ago

jschell42 commented 7 years ago

Hi, I tried bagging a 3GB file and got a "GC overhead limit exceeded" error. I can look into increasing my java memory heap on my machine (or move to a bigger machine for larger files) but want to make sure there isn't some size limitation for the app itself. Thanks!

htpvu commented 7 years ago

Hi,

We don't have an established size limitation for the tool. We did not discovered any limit when testing against large files, although we did not have any real use case to guide our testing.

Our first suggestion would be to increase your memory as you mentioned. To further help you with the issue, we'd need more information, such as your specific Java version, your JAVA_OPTS values, etc.

Are you, by any chance, trying to use the tool as part of a data rescue effort?

jschell42 commented 7 years ago

yes, we are using it as part of a data rescue effort.

i'm trying to run these larger bags on a more powerful computer (higher-end iMac vs. Macbook Air) to see if I can reproduce the issue.

htpvu commented 7 years ago

Let us know if you were successful. We'll be happy to help with the data rescue effort, either on this issue or other issues you may run into with the tool. we'll just need more information on the issue.

Please feel free to contact us at dataconservancy@gmail.com if you'd like to talk to us about using the tool generally.

emetsger commented 7 years ago

Hi @jschell42, I'm working on trying to replicate this bug with a package containing a single 35GiB file. While I did come across some other bugs (#34, #35), I was able to successfully package the 35 GiB file in 'exploded' and 'zip' forms.

Were you attempting to bag a singular 3 GB file, or was the package composed of a number of files, one of which was a 3 GB file?

jschell42 commented 7 years ago

Sorry it's taken me a while to get back to this. I just experienced this error with this dataset.

It's 216MB compressed, but expands to 1.78GB, and consists of 25K files. When I get the "java heap error" it's primarily because of the large # of files, rather than a single large file.

emetsger commented 7 years ago

Hi @jschell42,

So I have yet to receive a heap error (note: I'm using version 1.0.5 of the tool, which includes some optimizations (see #57) when creating the tree. My computer is also quite beefy.)

screen shot 2017-05-03 at 1 16 59 pm

I can create a package tree, and start to generate a package. However, things are progressing quite slowly. When the tool was creating the package tree, it appeared to hang, then recovered, and ultimately rendered the tree. I suspect the JVM was performing some garbage collection when the tool appeared to hang. I'm generating the package now, but it is still chugging along.

I agree with your assessment that this issue probably has more to do with the structure of the content and the number of files than the size of the content being packaged.

Let me do some profiling of the tool with your example dataset and get back to you!

emetsger commented 7 years ago

@jschell42 I should mention that I'm going to be out of the office until mid-week next week, so it'll be about a week before I'm able to respond. :(