cozy / cozy-photos-v2

Deprecated - New : https://github.com/cozy/cozy-drive/tree/master/src/photos - Personal Photo Gallery Manager
http://cozy.io
GNU Affero General Public License v3.0
27 stars 25 forks source link

Unable to access application after sharing a large album #181

Open clochix opened 8 years ago

clochix commented 8 years ago

This is a random problem some users experiment for a long time: when they share a big album, people get 503 errors when trying to browse the album. Sometime, restarting the application is enough to fix the problem, sometime we need to restart the full stack.

This seems to be a network problem: when streaming the photos to the user, sometime the sockets stay for a long time in CLOSE_WAIT state. By default, an application can only open 5 sockets to another application (same host and port). So, when 5 socket are in CLOSE_WAIT state, the applications are unable to communicate and end user gets 503 errors.

This error may happen between our proxy and the photos application, or between photos and the data-system (I'm not aware of similar problems between the reverse proxy and our proxy, or between the DS and CouchDB).

From a user point of view, the only way to get rid of this is to restart the application that opened the connection : either Photos or the Proxy. (from an administrator point of view, we can use some good old Perl script to force close this sockets).

So, a workaround would be to increase the max socket limit, but finding the root cause of this error may be better.

What makes this hard to debug is that we are unable to reproduce the error on demand. Its probably require to share a big album and access it at the same time from several browsers.

clochix commented 8 years ago

FTR, here's the Perl script to close all close-wait connections on the server: https://github.com/terwey/kill-close-wait-connections

poupotte commented 8 years ago

https://github.com/cozy/cozy-photos/pull/182 should fix this issue.

nono commented 8 years ago

It looks like the issue is still here. I've made another PR: #186.