Closed lforst closed 9 months ago
I would guess that we also want a "forbidden" or "disabled" result variant for when scraping is disabled.
ETA: Actually I believe there are really two reasons for "not attempted": either we already have it or scraping is disabled. Consequently we should split up "not attempted" into two results and remove the reason.
Also: does your data intentionally only include the URL of the source file, but not that of the sourcemap?
I would guess that we also want a "forbidden" or "disabled" result variant for when scraping is disabled.
I think "forbidden" can fall under error
and "disabled" under not-attempted
. (With corresponding codes of course.)
Also: does your data intentionally only include the URL of the source file, but not that of the sourcemap?
Ah, thanks. Good catch! Adjusted.
I think "forbidden" can fall under
error
and "disabled" undernot-attempted
. (With corresponding codes of course.)
Sorry, I intended "forbidden" and "disabled" to be the same thing, I just hadn't decided what the better name was yet (it's probably "disabled").
I opened https://github.com/getsentry/symbolicator/pull/1311 as a strawman implementation so that we have something to discuss.
The CompletedJsSymbolicationResponse
now contains a list of "scraping attempts". The definition of these scraping attempts is
pub struct JsScrapingAttempt {
pub url: String,
pub result: JsScrapingResult,
}
pub enum JsScrapingResult {
NotAttempted,
Success,
Failure {
reason: JsScrapingFailureReason,
details: String,
},
}
pub enum JsScrapingFailureReason {
NotFound,
Disabled,
InvalidHost,
PermissionDenied,
Timeout,
DownloadError,
Other,
}
So each scraping attempt has a url
and a result
. The result
can be NotAttempted
, Success
, or Failure
. Success
and NotAttempted
contain no further information (NotAttempted
just means that we got the file another way), but Failure
has an additional Reason
and possibly details
.
Here is an example from out test fixtures (written in YAML, but you'd be getting it as JSON):
scraping_attempts:
- url: "http://example.com/index.html"
result:
Failure:
reason: Disabled
- url: "http://example.com/test.min.js"
result: NotAttempted
- url: "http://example.com/test.min.js.map"
result: NotAttempted
This differs from your design above in a few respects.
Are these differences acceptable to you? Is there anything you would like to see changed?
@loewenheim Thank you! This is great <3
I have a few thoughts.
You can still associate them because every frame in the response contains the sourcemap URL.
Is that the case in the event payload we store in eventstore? Would we have to also adjust Sentry to put the sourcemap URL into the stack frame object?
We return a simple "reason" and possibly "details" instead of an error code and the http status.
HTTP status isn't too important I would say. I have reservations about the returned payload schema though. IMO it would be cleaner to always return an object in the result
field. To illustrate, here is what I suggest we change the example you gave to:
scraping_attempts:
- url: "http://example.com/index.html"
result:
status: Failure
reason: Disabled
- url: "http://example.com/test.min.js"
result:
status: NotAttempted
- url: "http://example.com/test.min.js.map"
result:
status: NotAttempted
That way we don't have to check for the type of result
and can basically assume result.status
is a string and result.reason
is an optional string. This is obviously up for debate.
Additionally, I would probably return the strings in a bit more pythonian manner, e.g. not_attempted
instead of NotAttempted
and so on. This is a bit whatever.
In general I think we are on the right track! My main concern lies in whether we can actually properly associate source map scraping attempts to the correct stack frame.
@loewenheim Thank you! This is great <3
I have a few thoughts.
You can still associate them because every frame in the response contains the sourcemap URL.
Is that the case in the event payload we store in eventstore? Would we have to also adjust Sentry to put the sourcemap URL into the stack frame object?
The information already exists in the event JSON you can view on Sentry, if that's what you mean.
We return a simple "reason" and possibly "details" instead of an error code and the http status.
HTTP status isn't too important I would say. I have reservations about the returned payload schema though. IMO it would be cleaner to always return an object in the
result
field. To illustrate, here is what I suggest we change the example you gave to:scraping_attempts: - url: "http://example.com/index.html" result: status: Failure reason: Disabled - url: "http://example.com/test.min.js" result: status: NotAttempted - url: "http://example.com/test.min.js.map" result: status: NotAttempted
That way we don't have to check for the type of
result
and can basically assumeresult.status
is a string andresult.reason
is an optional string. This is obviously up for debate.
That sounds like a good idea and would be easy to change.
Additionally, I would probably return the strings in a bit more pythonian manner, e.g.
not_attempted
instead ofNotAttempted
and so on. This is a bit whatever.
You're right, I'll change it to snake_case.
In general I think we are on the right track! My main concern lies in whether we can actually properly associate source map scraping attempts to the correct stack frame.
I'll send you an example event that I believe demonstrates that we can.
What do you think of this?
scraping_attempts:
- url: "http://example.com/index.html"
status: failure
reason: disabled
- url: "http://example.com/embedded.js"
status: not_attempted
- url: "http://example.com/embedded.js.map"
status: not_attempted
status
is always present, reason
and details
may be present for failures. We can also nest it inside a result
the way you suggested, if you'd prefer that. Fortunately this serialization logic is very flexible.
We could also do the scraping_attempts
as a map from URLs to results instead of a list. What do you think?
I'll send you an example event that I believe demonstrates that we can.
Based on our discussion that data.sourcemap
is available inside stackframe objects, having source files and sourcemaps separate sounds good to me!
status is always present, reason and details may be present for failures. We can also nest it inside a result the way you suggested, if you'd prefer that. Fortunately this serialization logic is very flexible.
No this seems great! Let's do it exactly like that :)
We could also do the scraping_attempts as a map from URLs to results instead of a list. What do you think?
I think a list is fine. It keeps us more flexible down the line. Of course, there is now the cost of having to iterate but the size is bounded anyhow so 🤷 No strong opinions. If we don't have a clear favorite I would opt for a list to be consistent with other mechanisms we have (debug_meta
).
I think a list is fine. It keeps us more flexible down the line. Of course, there is now the cost of having to iterate but the size is bounded anyhow so 🤷 No strong opinions. If we don't have a clear favorite I would opt for a list to be consistent with other mechanisms we have (
debug_meta
).
Makes sense, I'll leave it as a list then.
For our Source Map Debugger (Blue Thunder Edition™) initiative we require information from Symbolicator about the JS source scraping attempts that have been conducted.
Solution brainstorm
Returned payload (could be nested somewhere):
(TypeScript notation because it's the only language I know 😶🌫️)