IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
879 stars 492 forks source link

Service Stability: Long lived objects eventually cause repeated stop-the-world gc. #2013

Closed kcondon closed 9 years ago

kcondon commented 9 years ago

The recent instability in service today seems to mostly be due to continuous stop-the-world gc.

I know this because each time there was a failure I checked: jstat gcutil and jmap -heap

Increasing the heap size helped but it's still happening.

Suggestions are to look at the access log and run the memory profiler on the app, looking for large object hierarchies being created and not freed on the most common pages, such as the homepage.

mercecrosas commented 9 years ago

FYI - We're still having once in a while Service temporarily unavailable:

PROBLEM: HTTPS-dvn is CRITICAL on host dataverse.harvard.edu

Service: HTTPS-dvn Host: dataverse.harvard.edu Alias: Production installation of Dataverse at Harvard Address: dataverse.harvard.edu Host Group Hierarchy: Opsview > DVN > DVN_Production State: CRITICAL Date & Time: Sun Apr 19 04:54:14 EDT 2015

Additional Information:

HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 582 bytes in 30.697 second response time

pdurbin commented 9 years ago

Adding the Content-Length header in 2c81bea 5 days ago seems to have helped with large file downloads but there is more complexity to figure out having to do with Transfer-Encoding: chunked header for large files.

scolapasta commented 9 years ago

Passing to QA.

kcondon commented 9 years ago

Tested the original test case: Downloading a specific large datafile, >2GB immediately caused the 503 error. This no longer happens. Used a direct to glassfish, port 8181 with apache rewrite rules to bypass apache for downloading. Closing.