Closed prismofeverything closed 4 years ago
(My previous msg got stuck in the outbox while at the gym.)
The http request for metadata is a feature of gce VMs. Anyway gce metadata is set when launching a gce vm so their isn't any when running off gce. I like your debug config for that. It can try to get metadata and fallback to the config file.
Jerry
On Thu, Nov 21, 2019, 12:36 PM Ryan Spangler notifications@github.com wrote:
@prismofeverything commented on this pull request.
In src/sisyphus/core.clj https://github.com/CovertLab/sisyphus/pull/21#discussion_r349305458:
@@ -134,10 +135,20 @@ (update state :kafka merge consumer)))
(defn start!
- "Start the system by making all the required connections and returning the state map."
- "Start the system by making all the required connections and returning the state map.
- The priority of values for rabbit config are
- command line options
- gce metadata fields
- values from file resources/config/sisyphus.clj
- defaults from ns rabbit/default-config"
and then sisyphus/rabbit code derives the 3 rabbit queuing parameters from that workflow name?
Ah I like this, this is the way to go. Yeah let's just supply one parameter and sisyphus can work out the details.
Is gce-metadata not getting results due to not running on a GCE VM? I tested it on a GCE worker node.
Ah yeah, maybe doesn't work from a remote machine? I am leaning to doing this with the compute service from the previous PR instead of an HTTP call where you have to worry about passing in credentials etc if you are not in the network. I'll look into it.
— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/CovertLab/sisyphus/pull/21?email_source=notifications&email_token=AAH6VMFXX635OD2WGQFEKVLQU3WN5A5CNFSM4JP3PYJ2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCMSK2OY#discussion_r349305458, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH6VMBYERWM4FYFR53QJCTQU3WN5ANCNFSM4JP3PYJQ .
Okay @1fish2, latest update has:
rabbit-metadata
a chance to accept a workflow name to generate its configuration (instead of just reading it from GCE metadata).sisyphus.base
which includes no other sisyphus namespaces and collects all the useful functions all the other namespaces are using. Could probably be more around that should be migrated, I figure now that we have somewhere to put them we may use it. gce-metadata
function (java.net.UnknownHostException
), but the others are wrapping functions inside handlers where any exception shuts down that whole element of server functionality, requiring restart of the server. I think that is the other reason to catch Exception
, but we can talk more about those cases if you want. Most excellent!
I agree about the remaining cases that catch any Exception
since it logs the exception and punts on that chunk of work.
One detail: In log/gce-metadata
, http/get
throws UnknownHostException
if it's not running on GCE and thus can't access "http://metadata.google.internal", but it throws a different exception if the requested metadata field like "attributes/workflow" wasn't set. Missing attributes will happen when we change the software on one server at a time.
I can't tell if "slingshot" rethrows the exception or if it's the fallback catch clause, but this will catch the exception:
(try (clj-http/get "http://metadata.google.internal/computeMetadata/v1/instance/xxx"
{:headers {:metadata-flavor "Google"}})
(catch clojure.lang.ExceptionInfo e ...))
Since clojure.lang.ExceptionInfo seems to be a general purpose exception like Exception, the code should distinguish the expected case from unexpected cases. (.getMessage e)
returns "clj-http: status 404" and (.data e)
returns
{:cached nil, :request-time 8, :repeatable? false,
:protocol-version {:name "HTTP", :major 1, :minor 1}, :streaming? true,
:http-client #object[org.apache.http.impl.client.InternalHttpClient 0x4cc49356 "org.apache.http.impl.client.InternalHttpClient@4cc49356"],
:chunked? false, :type :clj-http.client/unexceptional-status, :reason-phrase "Not Found",
:headers {"Metadata-Flavor" "Google", "Date" "Thu, 05 Dec 2019 00:48:08 GMT", "Content-Type" "text/html; charset=UTF-8", "Server" "Metadata Server for VM", "Connection" "Close", "Content-Length" "1592", "X-XSS-Protection" "0", "X-Frame-Options" "SAMEORIGIN"},
:orig-content-encoding nil, :status 404, :length 1592,
:body "<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/computeMetadata/v1/instance/xxx</code> was not found on this server. <ins>That’s all we know.</ins>\n",
:trace-redirects []}
That experimental code snippet followed (require '[clj-http.client :as clj-http])
since gaia.core requires aleph.http :as http
.
Is it OK in this case to just handle clojure.lang.ExceptionInfo
as an unset metadata field? It's not obvious to me what expanse of code raises ExceptionInfo.
Hmm, ex-data
?
Hey @1fish2, I'm not sure about all this. I thought ExceptionInfo
was a class that describes whatever exception got thrown, not an exception in its own right. It is a way to wrap the exception and associate a map of data along with it. The real exception should be in that data structure, though you are right this may be the only way to catch it. More investigation is required.
Okay @1fish2, looking into this further it turns out clj-http
is throwing exceptions on status failures by default, and has an option to turn this off (:throw-exceptions false
, see https://github.com/CovertLab/sisyphus/pull/21/commits/0d3ddbc42c85ca14130563f960b29f4f613e8250)
So we can handle that case directly and not have to do it through exception handling. I set up gce-metadata
to always return either the requested field or nil, so callers can find the value another way. Let me know what you think.
Great! I'd probably explain in the comment that the expected failure modes are UnknownHostException when not running on GCE and not-OK when the requested field wasn't set. If we ever need to treat those two differently, we won't have to reconstruct the backstory.
clj-http says "batteries included" but I don't see where it defines symbolic status codes. I'm fine with 200
here but I'm going one step further in a Gaia fix to the error handling. I'll just return status 400
if there was any handler exception rather than distinguishing client errors from server errors, but it's getting closer to wanting symbolic codes.
It turns out we get a 404 if the metadata field does not exist, which translates to "not found". I added a comment in there. I figure if people want to know if they are on GCE or not they can use a method for that specifically, this function doesn't need to act as dual GCE detector/metadata fetcher.
clj-http says "batteries included" but I don't see where it defines symbolic status codes.
What do you mean exactly by "symbolic status codes"? Currently it is using standard HTTP response codes, and I actually appreciate the integers for this purpose as they map exactly on to the HTTP standard.
Yesterday I was thinking that we might later want different behavior for debugging off of GCE vs. the ordinary case of reading an attribute that wasn't set. But we can use a different config file for debugging.
A few HTTP status codes like 404 are as well known as their names, but for most of them I'd recognize the names but not the numbers.
This PR enables Sisyphus to look to its own GCE instance metadata to configure which rabbit exchange, queue and routing key to use for consuming task messages.
Currently this works in theory, but I am not getting results back from
log/gce-metadata
even though those metadata fields are set. Further investigation is necessary, so this PR is not yet ready to merge.One thing to point out is that the values for rabbit configuration are able to come from four different places, so I had to make a decision about the priority order for setting these values. Currently:
resources/config/sisyphus.clj
sisyphus.rabbit/default-config
ns.