buildstream-migration / bst-staging

GNU Lesser General Public License v2.1
0 stars 0 forks source link

Client should open only one HTTP2 connection to CAS cache server #810

Open Cynical-Optimist opened 4 years ago

Cynical-Optimist commented 4 years ago

See original issue on GitLab In GitLab by [Gitlab user @valentindavid] on Dec 6, 2018, 22:28

Background

There seems to be issues to connect to the CAS cache server. Now and then clients get a timeout in connect(). While investigating that, I realized that we have multiple connections, from the client. One for each job. This could lead to poor performance and also slow down the capacity for the sever to handle accept() for new connections.

Examples of failures:

Task description

All RPC should be made to go through one process, and multiplexed into one connection.

Acceptance Criteria


Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @valentindavid] on Dec 7, 2018, 09:59

changed the description

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @valentindavid] on Dec 7, 2018, 10:00

changed the description

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @juergbi] on Dec 10, 2018, 13:31

The main blocker for doing this in BuildStream is that we use a separate forked subprocess for each job, which makes it impossible to share a single TCP connection across jobs. Long-term we might move away from the fork multiprocessing model, however, that's in the cards for now.

However, as part of the BuildBox effort we're planning to introduce a local buildbox-casd service that will act as caching CAS proxy, which will allow proper connection multiplexing. While the initial focus will be on using this from the other BuildBox components, the slightly longer term goal is to use buildbox-casd also in BuildStream, instead of directly connecting to remote servers. This approach should solve this issue.

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @jjardon] on Jan 7, 2019, 09:33

While this is something that needs to be fixed, it will not happen until buildstream 1.4 are the changes suggested by [Gitlab user @juergbi] as too invasive

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @juergbi] on Aug 30, 2019, 12:04

mentioned in merge request !1540

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @juergbi] on Aug 30, 2019, 12:21

With !1499, individual job subprocesses in master use buildbox-casd as proxy to the CAS server and thus, connections should be properly shared now. However, buildbox-casd only handles the standard CAS protocol, i.e., it does not proxy the BuildStream artifact/source services.

This means that BuildStream may still initiate lots of connections to bst-artifact-server. The artifact/source service requests are typically handled quickly (only metadata transfer) and thus, these connections are short-lived and a bit less of an issue for the server. However, the high number of connections may still be a concern. It might thus make sense to implement a small proxy for the artifact/source service as well. See also https://gitlab.com/BuildStream/buildstream/merge_requests/1540#note_210356602

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @juergbi] on Sep 12, 2019, 11:20

mentioned in merge request !1601

Cynical-Optimist commented 4 years ago

In GitLab by [Gitlab user @tristanvb] on Apr 17, 2020, 11:25

mentioned in merge request !1867