Closed ninovanhooff closed 6 years ago
@Paulchen-Panther thanks for mentioning that.
Note that I also was able to reproduce this on master (618ac31681433f6e74c86814526b91ab87f9935e) After toggling about 20 times
@Paulchen-Panther do you want to investigate it further?
Try ignoring the server response to see if the error still occurs
I have ran into this too - usually after toggling on/off repeatedly like nino mentioned.
It seems to happen when reading the response size from the header, which provides a large number sometimes apparently, then trying to allocate that size appears to cause the OOM. I can't image the response is actually that large.
I'll try to run the hyperion server in debug mode to see if I can catch a glimpse of what it is actually responding with.
I'm fairly confident ignoring the response would fix the issue, but would make me feel better if we can track down what causes it. I like the idea of handling that response in some way that would be useful in the future (obviously it is not right now), so I would consider ignoring it a last resort.
Well I tried both debug
and verbose
logging on my hyperion server (it is the .ng build), and unfortunately it does no appear to give me a preview of the response data being sent. Might need to peel back some layers on this one. I'll try the old hyperion build too and see if I can pull anything out of there.
May look towards network logging if I can't catch what is going on that way!
I tried logging out the size of the response. The typical expected size is 4 bytes (which is the vast majority), but every now and then I see one that is 134287361 bytes!
I'm thinking maybe we could just check the response size before initializing the byte buffer, and if it is larger than say 1kb then we just disregard the response.
Okay I think I see what is actually happening here... Its a fun one.
It seems that there is a timing or synchronicity problem with the socket input stream. The typical response header I see is [0,0,0,4]
with the message values being [8,1,16,1]
, which is an acceptable response for the protobuf.
I was seeing a InvalidProtocolBufferException: Protocol message contained an invalid tag (zero)
which seemed odd, so I logged that response and found it to be [0,0,0,4]
- interesting. Protobuf does not accept 0s in the response, so that is why I see the error message... I can't help but notice this is the same sequence of values provided when reading the header to get the response length.
So using this line of reasoning, I checked the header value on one of our large responses - [8,1,16,1]
which 0x08011001 == 134287361
! Our response messages are out of sync, and it is treating the protobuf message value as the header, which it is calculating the size from. Hence we get an OOM when trying to allot a byte array that large!
Will take a little time for me to investigate a proper fix for this, but no more guess work 😁
Added a fix in 8e5b0b86e5f6b33ac2bcceb1d5e297773bf7774d
git hash 4f09e7ce072db265c7524616d4177bc25f83f001 (image quality branch)