Closed duzinkie closed 4 months ago
Posting some of my analysis for the "http batching turned on scenario"
public abstract void onReadCompleted (UrlRequest request, UrlResponseInfo info, ByteBuffer byteBuffer) - this is expected client behavior.
Invoked whenever part of the response body has been read. Only part of the buffer may be populated, even if the entire response body has not yet been consumed. With the exception of onCanceled(), no other UrlRequest.Callback method will be invoked for the request, including onSucceeded() and onFailed(), until UrlRequest.read() is called to attempt to continue reading the response body.
The JSON parsing logic of BatchingHttpInterceptor.kt#183 and it's implementation in JsonReaders.kt#25 seems to stop early and not read the JsonReader.Token.END_DOCUMENT which in turn never gets to signal to the underlying HTTP engine to continue reading the request, thus never getting to realize that there's actually nothing more to read, and firing all the lifecycle events associated with that. That also explains why the app would behave as if it received all the data.
a simple hackfix I've attempted and verified to work, changing the batching interceptor code around parsing JSON from:
val list = AnyAdapter.fromJson(BufferedSourceJsonReader(responseBody), CustomScalarAdapters.Empty)
to
val responseReader = BufferedSourceJsonReader(responseBody)
val list = AnyAdapter.fromJson(responseReader, CustomScalarAdapters.Empty)
responseReader.hasNext()
appears to make the app behave correctly.
ofc I realize this is not a proper fix - but I wanted to post my findings at the end of the day. Perhaps a more robust one should verify, that after reading the batched operation responses array, end document token appears - happy to hear your thoughts/suggestions on this. Also I have not looked into the scenario where HttpBatching is not enabled, and what causes all requests to be interpreted as cancelled by Cronet (I suspect json parsing also to be at play here, but I'm simply out of time for today).
Thanks for the detailed issue and reproducer ❤️.
- The JSON parsing logic of BatchingHttpInterceptor.kt#183 and it's implementation in JsonReaders.kt#25 seems to stop early and not read the JsonReader.Token.END_DOCUMENT which in turn never gets to signal to the underlying HTTP engine to continue reading the request, thus never getting to realize that there's actually nothing more to read, and firing all the lifecycle events associated with that. That also explains why the app would behave as if it received all the data.
Indeed, with JSON we can know that an object/list is terminated without reading the end of stream. This would allow to read several objects concatenated in the same stream. Not that it's the case here but the API allows for that.
What should happen is that the BatchingHttpInterceptor
reads the JSON response and emits the GraphQL responses upstream. When done, the consumer cancels the Flow
, which in turns cancels the request. All in all, I would say that getting cancellation before reading the end of stream is probably expected at this point.
This also means that if a server sends garbage after a JSON response, ApolloClient
is lenient to that and will just ignore the garbage. We could probably change that to be more strict but I'm not 100% sure about the implications.
Do you have issues besides lifecycle events (request finished) not being fired? How bad would it be to just ignore those missing events (and the cancelled ones as well?)
Thanks for the detailed issue and reproducer ❤️.
np. fun fact - my team told me we're the og authors of batching implemented in apollo-kotlin :)
Indeed, with JSON we can know that an object/list is terminated without reading the end of stream. This would allow to read several objects concatenated in the same stream. Not that it's the case here but the API allows for that.
I didn't know that, my knowledge of gql is rather limited - what else could be there in the response? Wouldn't that indicate that it's no longer a JSON document (but something like JSONL instead?) - and so in turn, wouldn't that be discernible by content-type and/or some other parts of the response?
also the way I understand the current batching interceptor impl, would anyone really have a chance to access the rest of the response?
What should happen is that the BatchingHttpInterceptor reads the JSON response and emits the GraphQL responses upstream. When done, the consumer cancels the Flow, which in turns cancels the request. This also means that if a server sends garbage after a JSON response, ApolloClient is lenient to that and will just ignore the garbage. We could probably change that to be more strict but I'm not 100% sure about the implications.
Is that consistent with the apollo client behavior when there's no batching turned on? (e.g: would extra content beyond the first JSON "entity" also be ignored there?)
All in all, I would say that getting cancellation before reading the end of stream is probably expected at this point.
Just to be clear - that's not the current behavior. I've described the cancellations being fired for all the requests, but only when batching is turned off. I haven't looked at the root cause of that yet.
Do you have issues besides lifecycle events (request finished) not being fired? How bad would it be to just ignore those missing events (and the cancelled ones as well?)
In the case of the example app, it just stalls after a few requests, so that can't be ignored. Haven't observed that in our app, but I'm guessing regardless, cronet http engine itself must have some limits on requests considered in-flight.
Ignoring cancellations would be more digestible but still presents a problem of not being able to distinguish between actual cancellations (triggered, by let's say - a user navigating away from the ui that a request originated from) and those "normal batching cancellations" - we look at data like this as one of the inputs in assessing our apps health across releases. My preference would be to not signal abnormal exit if there's not really anything wrong happening (or at least have clients be able to opt into how they'd like to handle that)
We still have the other issue reported in https://github.com/apollographql/apollo-kotlin/issues/5847
fun fact - my team told me we're the og authors of batching implemented in apollo-kotlin :)
It's a small world :)
what else could be there in the response? Wouldn't that indicate that it's no longer a JSON document (but something like JSONL instead?) - and so in turn, wouldn't that be discernible by content-type and/or some other parts of the response?
My point was entirely theoretical based on low level JsonReader
API.
At a high level, no GraphQL server that I know of, batching or not batching, sends data after the payload. So it should be safe to consume that.
May I suggest a fix like this?
val list = responseBody.jsonReader().use {jsonReader ->
// TODO: this is most likely going to transform BigNumbers into strings, not sure how much of an issue that is
AnyAdapter.fromJson(jsonReader, CustomScalarAdapters.Empty).also {
if (jsonReader.peek() != JsonReader.Token.END_DOCUMENT) {
println("Apollo: extra tokens after payload")
}
}
}
If it works for you, feel free to open a PR and I'll review it but I probably won't have time to dive into this myself as it's KotlinConf busy busy times right now.
May I suggest a fix like this? If it works for you, feel free to open a PR
Ty, yeah, I think something in that vein is ok. I'll take a closer look at the cause of cancellations when batching is off, and also revisit the old issue I've submitted, but should be able to squeeze in a PR, even if it's just for the batching interceptor.
busy busy times right now.
np, your responses have been very helpful. I'm traveling next week as well, though not for KotlinConf :). This is is not affecting us in any severe way, and a path forward looks relatively clear now.
short update on the "requests cancelled when http batching is not enabled"
it's sort of a similar case. excepts from stacktrace suggest:
I haven't experimented with a "hack fix" as replacing the entire transport/parser seems more involved than just an interceptor, but my bet is this is the cause.
Anyways, in terms of PRs imo that should be a different one. And perhaps it could be similar - it doesn't sound unreasonable for an object called ResponseParser to assert that it has read the entire response before closing the buffer (though ofc, how to handle cases where it had not is something to be discussed, especially from compatibility perspective). idk, perhaps the control of the reader should be pulled higher up the call stack.
In any case, this is a secondary issue.
it doesn't sound unreasonable for an object called ResponseParser to assert that it has read the entire response before closing the buffer (though ofc, how to handle cases where it had not is something to be discussed, especially from compatibility perspective)
My initial thoughts on this are that both should check for the EOF token after reading batched list/response. And if not EOF then just ignore the trailing token. We might need a function that consumes a jsonReader
till the end. The way it is done right now, it requires a bit of parsing.
Hello, I've published 4 PRs for the main/3-x branches (the fixes are the same though). Idk why, but github misses one in the list of references, so here are all the links:
https://github.com/apollographql/apollo-kotlin/pull/5892 https://github.com/apollographql/apollo-kotlin/pull/5893 https://github.com/apollographql/apollo-kotlin/pull/5894 https://github.com/apollographql/apollo-kotlin/pull/5895
I'm in the process of getting approval for accepting CLA. in the meantime, happy to hear your thougths.
Now, that I've created the PRs (and after some internal discussions with my team), I'm considering if it's the right choice to just fail silently on extra payload in content, would like to discuss that part once more.
From the compatibility standpoint, ofc that would introduce exceptions on paths where those weren't thrown before. On the other hand, any other "json not valid" would throw an exception so this could be seen as just fixing an inconsistency in the already existing strict approach to such errors. Additionally, like you said, in practical applications, it will always be end-of-document, so to an extent the debate is about which form of no-op is best :)
I'm considering if it's the right choice to just fail silently on extra payload in content, would like to discuss that part once more.
Indeed, that's a fair point. I was going to quote the robustness principle but this is more and more questioned. See https://www.ietf.org/archive/id/draft-iab-protocol-maintenance-05.html for an example. Being strict does have its advantages.
The thing is I don't think this behaviour is specified anywhere (and even less for batching) so if we want to be strict there, I'd rather make sure we're not breaking someone elses workflow. Might be worth raising the concern in the GraphQL over HTTP spec ?
I'll try to clarify that in the spec repo - https://github.com/graphql/graphql-over-http/issues/292
as far as I can read/understand the spec:
When a server receives a well-formed GraphQL-over-HTTP request, it must return a well‐formed GraphQL response
https://graphql.github.io/graphql-over-http/draft/#sec-Response
A response to a GraphQL request must be a map.
https://spec.graphql.org/draft/#sec-Response
it boils down to whether including anything that's semantically different from "end of document" breaks the "well-formedness" of the response.
When a server receives a well-formed GraphQL-over-HTTP request, it must return a well‐formed GraphQL response
Looks like a pretty good case that trailing data is an error and should be rejected by the client. Let's give it a couple of days to see what everyone thinks about this but I'm more and more team "let's make this an error"
@duzinkie Let's make it an error? I just added a comment in your main
PR here.
Benjie has a good point that if there is a lot of whitespace (and no trailing token), the response will now be delayed until all the whitespace is consumed but this is the case everywhere in the json response, not only at the end so backends should be aware of whitespace in all cases. What's more, it should reasonnably never happen in practice.
@duzinkie this is all merged in and will be available in the next release.
Fix is in v4.0.0-rc1 and v3.8.5.
Do you have any feedback for the maintainers? Please tell us by taking a one-minute survey. Your responses will help us understand Apollo Kotlin usage and allow us to serve you better.
Version
3.8.4
Summary
I've adapted https://github.com/apollographql/apollo-kotlin-tutorial to use https://github.com/google/cronet-transport-for-okhttp (https://github.com/duzinkie/apollo-kotlin-tutorial/tree/duzinkie/cronet-lifecycle-repro) and have observed the app is not functioning correctly when Cronet is used as an http engine.
Let me elaborate: The app built from https://github.com/duzinkie/apollo-kotlin-tutorial/tree/duzinkie/cronet-lifecycle-repro is working fine, until one tries to adjust the logging level of HttpLoggingInterceptor (https://github.com/duzinkie/apollo-kotlin-tutorial/commit/9b3e58ab1dbdb9e2a3151a43e3cdb44531e5b079#diff-518bd35be782210b23f518eaf618c538f99e9f33a81f6b518ccb053b4315eff1R94) - when that happens 2 distinct failure modes occur, depending on whether http batching is enabled.
side notes:
Steps to reproduce the behavior
cronet request finished: 0
lines from the cronet http engine (indicating requests are finishing just fine)HttpLoggingInterceptor
toBASIC
b. disablehttpBatching
cronet request finished
messages are absent and the app hangs after a few interactionscronet request finished: 2
log messagesLogs
I think reproduction steps is enough for this issue, lmk if you'd like to see my logs anyway.