Open samay-sharma opened 8 years ago
@samay-sharma / @anarazel -- I had two quick questions.
Do we know how long ports stay in the TIME_WAIT state (by default)? Also, how many ports do we have available to allocate from? From that, we can roughly calculate the number of new connections Citus can open per second in a sustained manner.
Another user brought up that they also could easily reproduce running out of ports with COPY.
When we run several parallel short COPYs on hash distributed tables, we saw errors which said that ports were not available for establishing connections for COPY. Note that the number of connections were still lower than the max_connections parameter on the worker nodes.
This is likely because of many sockets being in TIME_WAIT. We enabled tcp_tw_reuse and tcp_tw_recycle, and set tcp_fin_timeout to 30 but that still didn't resolve the issue.
We should investigate further to understand the cause of this.
@anarazel : Please feel free to add anything I may have missed.