Open jschell42 opened 7 years ago
Hi,
We don't have an established size limitation for the tool. We did not discovered any limit when testing against large files, although we did not have any real use case to guide our testing.
Our first suggestion would be to increase your memory as you mentioned. To further help you with the issue, we'd need more information, such as your specific Java version, your JAVA_OPTS values, etc.
Are you, by any chance, trying to use the tool as part of a data rescue effort?
yes, we are using it as part of a data rescue effort.
i'm trying to run these larger bags on a more powerful computer (higher-end iMac vs. Macbook Air) to see if I can reproduce the issue.
Let us know if you were successful. We'll be happy to help with the data rescue effort, either on this issue or other issues you may run into with the tool. we'll just need more information on the issue.
Please feel free to contact us at dataconservancy@gmail.com if you'd like to talk to us about using the tool generally.
Hi @jschell42, I'm working on trying to replicate this bug with a package containing a single 35GiB file. While I did come across some other bugs (#34, #35), I was able to successfully package the 35 GiB file in 'exploded' and 'zip' forms.
Were you attempting to bag a singular 3 GB file, or was the package composed of a number of files, one of which was a 3 GB file?
Sorry it's taken me a while to get back to this. I just experienced this error with this dataset.
It's 216MB compressed, but expands to 1.78GB, and consists of 25K files. When I get the "java heap error" it's primarily because of the large # of files, rather than a single large file.
Hi @jschell42,
So I have yet to receive a heap error (note: I'm using version 1.0.5 of the tool, which includes some optimizations (see #57) when creating the tree. My computer is also quite beefy.)
I can create a package tree, and start to generate a package. However, things are progressing quite slowly. When the tool was creating the package tree, it appeared to hang, then recovered, and ultimately rendered the tree. I suspect the JVM was performing some garbage collection when the tool appeared to hang. I'm generating the package now, but it is still chugging along.
I agree with your assessment that this issue probably has more to do with the structure of the content and the number of files than the size of the content being packaged.
Let me do some profiling of the tool with your example dataset and get back to you!
@jschell42 I should mention that I'm going to be out of the office until mid-week next week, so it'll be about a week before I'm able to respond. :(
Hi, I tried bagging a 3GB file and got a "GC overhead limit exceeded" error. I can look into increasing my java memory heap on my machine (or move to a bigger machine for larger files) but want to make sure there isn't some size limitation for the app itself. Thanks!