Closed brianlball closed 1 year ago
tests work
[1] "Check Run Flag run command: ruby /opt/openstudio/R/lib/api_get_status.rb -h http://web:80 -a b2f6b810-15e0-4b65-96e2-069a8e555210" [1] "Check Run Flag z: {:submit_simulation=>false, :sleep_time=>5, :host=>\"http://web:80\", :analysis_id=>\"b2f6b810-15e0-4b65-96e2-069a8e555210\"}" [2] "Check Run Flag z: /opt/openstudio/R/lib/api_get_status.rb success! get_count: 1" [3] "Check Run Flag z: {\"status\":true,\"result\":true}" [1] "run_flag_json: TRUE" "run_flag_json: TRUE"
@nllong I think this is the underlying issue https://github.com/NREL/OpenStudio-server/issues/337 as to why we are getting datapoints with NA status that dont get run.
setting max_request_queue_size (MAX_REQUESTS) to a value of 0 means that the queue is unbounded.
R cluster run calls the following files:
ruby /opt/openstudio/R/lib/api_create_datapoint.rb:
which calls
ruby /opt/openstudio/R/lib/api_get_status.rb
(to check if analysis still has run_flag == true)
The logs above show that this can get a 503. so I think we should:
Interesting, yeah, I would guess that the 503 is due to an unresponsive web container. I like your approach of 1, but I think the real issue is 2. The problem with just implementing 1 is that the web container is likely to be slammed for 10's of minutes if not longer, so a retry would quickly timeout too. If you can figure out how to enforce the size of the MAX REQUESTS, then that should easily fix the problem.
If you can figure out how to enforce the size of the MAX REQUESTS, then that should easily fix the problem.
@nllong I think we can change the nginx.conf file wiht the appropriate values and then some version of >nginx -s reload
@nllong what do you think of these proposed changes to allow passenger_max_request_queue_size
and passenger_max_pool_size
to be set in an OSA, since all problems are not built the same, and call nginx -s reload
to reload without downtime?
[20:08:54.810129 INFO] whoami: nginx
[20:08:54.810146 INFO] test the nginx.conf file
[20:08:54.817204 INFO] test_config: nginx: the configuration file /opt/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /opt/nginx/conf/nginx.conf test is successful
[20:08:54.817237 INFO] test_config.include?('syntax is ok'): true
[20:08:54.817249 INFO] test_config.include?('test is successful'): true
[20:08:54.817256 INFO] get nginx processes
[20:08:54.818898 INFO] nginx_pids:
root 64 0.0 0.0 46612 9392 ? S 20:08 0:00 nginx: master process /opt/nginx/sbin/nginx
nginx 78 0.0 0.0 47048 4560 ? S 20:08 0:00 nginx: worker process
nginx 79 0.0 0.0 47048 4560 ? S 20:08 0:00 nginx: worker process
nginx 80 0.0 0.0 47048 3408 ? S 20:08 0:00 nginx: worker process
nginx 110 66.5 1.4 786180 234440 ? Sl 20:08 0:02 Passenger AppPreloader: /opt/openstudio/server
nginx 157 47.0 1.2 786080 206296 ? Sl 20:08 0:00 Passenger AppPreloader: /opt/openstudio/server (forking...)
nginx 177 0.0 0.0 4636 836 ? S 20:08 0:00 sh -c ps aux|grep nginx
nginx 178 0.0 0.0 34412 2924 ? R 20:08 0:00 ps aux
nginx 179 0.0 0.0 11468 1072 ? S 20:08 0:00 grep nginx
[20:08:54.818950 INFO] nginx_worker_pids: ["78", "79", "80"]
[20:08:54.818961 INFO] reload the nginx.conf
[20:08:59.823727 INFO] count: 1
[20:08:59.823815 INFO] get nginx processes
[20:08:59.826758 INFO] nginx_pids:
root 64 0.0 0.0 46868 9648 ? S 20:08 0:00 nginx: master process /opt/nginx/sbin/nginx
nginx 78 0.0 0.0 47048 4560 ? S 20:08 0:00 nginx: worker process is shutting down
nginx 110 29.5 1.4 786180 234468 ? Sl 20:08 0:02 Passenger AppPreloader: /opt/openstudio/server
nginx 157 13.4 1.2 786080 206272 ? Sl 20:08 0:00 Passenger AppPreloader: /opt/openstudio/server (forking...)
nginx 199 0.0 0.0 47112 3744 ? S 20:08 0:00 nginx: worker process
nginx 204 0.0 0.0 47112 3744 ? S 20:08 0:00 nginx: worker process
nginx 206 0.0 0.0 47112 3744 ? S 20:08 0:00 nginx: worker process
nginx 212 0.0 0.0 4636 880 ? S 20:08 0:00 sh -c ps aux|grep nginx
nginx 213 0.0 0.0 34412 2980 ? R 20:08 0:00 ps aux
nginx 214 0.0 0.0 11468 1040 ? S 20:08 0:00 grep nginx
[20:08:59.826823 INFO] nginx_worker_pids2: ["199", "204", "206"]
[20:08:59.826861 INFO] reload nginx.conf success
@brianlball I think the plan was to not dynamically change the nginx state and rather just configure nginx to handle the expected peak load that rserve needs? If so, can we close this and open a new issue for that?
@brianlball Should we close this as the plan is to use a fixed resource spec?
I think that is the plan. If we have enough resources, then this shouldn't be an issue.
api retries implemented in https://github.com/NREL/OpenStudio-server/pull/682 closing this as we dont want to change NGINX
there can be 503 service unavailable errors when the web container gets slammed.
[1] "run command: ruby /opt/openstudio/R/lib/api_create_datapoint.rb -h http://web:80 -a d7fa128a-2c5b-4ce5-a601-8942f82427f6 -v -26.6666666666667,0 --submit" [1] "Check Run Flag run command: ruby /opt/openstudio/R/lib/api_get_status.rb -h http://web:80 -a d7fa128a-2c5b-4ce5-a601-8942f82427f6" [1] "Check Run Flag z: {:submit_simulation=>false, :sleep_time=>5, :host=>\"http://web:80\", :analysis_id=>\"d7fa128a-2c5b-4ce5-a601-8942f82427f6\"}" [2] "Check Run Flag z: /opt/openstudio/R/lib/api_get_status.rb Error: 503 Service Unavailable:/usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient/abstract_response.rb:223:in
exception_with_response'" [3] "Check Run Flag z: /usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient/abstract_response.rb:103:inreturn!'" [4] "Check Run Flag z: /usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient/request.rb:809:in
process_result'"[5] "Check Run Flag z: /usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient/request.rb:725:in
block in transmit'" [6] "Check Run Flag z: /usr/local/lib/ruby/2.7.0/net/http.rb:933:in
start'"[7] "Check Run Flag z: /usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in
transmit'" [8] "Check Run Flag z: /usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in
execute'"[9] "Check Run Flag z: /usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in
execute'" [10] "Check Run Flag z: /usr/local/lib/ruby/gems/2.7.0/gems/rest-client-2.0.2/lib/restclient.rb:67:in
get'"[11] "Check Run Flag z: /opt/openstudio/R/lib/api_get_status.rb:62:in
<main>'" [12] "Check Run Flag z: {\"status\":false,\"result\":true}"
this should get retried