Open joshnieman-nebraska opened 4 months ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagraThapar @pjohari-ms @TheovanKraay.
@joshnieman-nebraska - thanks for the detailed instructions and context on this issue, we will look into this soon.
any update to this? Wondering if a fix has been made to any upgraded Spring Cosmos java library.
@joshnieman-nebraska apologies for the delayed response, unfortunately no updates on this, but let me prioritize this, will provide an update by next week.
@trande4884 can you please look into this, thanks!
@joshnieman-nebraska I have so far been unable to reproduce this issue, I have a 54KB item that I am retrieving on the latest release and I have even tried doing 200 retrievals in a 2 minute window and the error does not reproduce. You said it was happening on findAll() but the stack trace shared doesn't have findAll() in it anywhere, however it does have these lines from your code:
at genericappservice.adapter.service.BenefitDiscoveryIncomeChartService.findFederalPovertyLevelByProgramAndHouseHoldSize(BenefitDiscoveryIncomeChartService.java:25)
Hi @joshnieman-nebraska. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
Hi @joshnieman-nebraska, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
I will try again today using the latest package to see if the issue is still present since there have been a few patch versions since I last reported the issue.
Tried the latest (5.17.1) and the issue still persists. Another thing I'm noticing is the cosmos RUs are spiking because of all the reads that are happening to the DB. Typically we only noticed 429s in the logs, nothing like this specific error that is occuring. Is it possible that this error would get thrown in the case where extreme throttling is occurring?
@joshnieman-nebraska are you able to share the actual commands you are running or some sort of sample so I can try to reproduce again?
I'm working with my automation team to get the details and they have shared with me that their load test is hitting our API 1200 times within a few minute span. I can see in app insights each API hits the DB 10-13 times, except when there is a failed query then the API will fail and the cosmos retry does not happen. The query is just doing a lookup by partition key. Could it be something that is happening when a particular Cosmos container is overloaded? I can see the total requests in the Cosmos metrics is reaching ~13k spike during the load test.
Describe the bug After upgrading the spring-cloud-azure-starter-data-cosmos from 5.5.0 to 5.13.0 the startOperation() is getting called multiple times in CosmosDiagnosticsContext for findAll(PartitionKey) iterable response on a Spring Data Cosmos Repository when iterable is accessed subsequent times as part of a load test. This is library code that is causing a fatal error in the process. This was not happening in 5.5.0 and upon upgrading it started happening. We did not change any cosmos client config or monitoring config. Also tried downgrading to different versions and it appears to be in every version after 5.5.0 up to 5.13.0.
Exception or Stack Trace
To Reproduce Not easily reproduced, but occurs when executing a load test against a process that gets data from a container using findall(PartitionKey) and iterating over the response. The load test should consist of at least 50 requests in a 2 minute span and I assume calling the next() on the iterable is required. The data in the container should be one single item that is 54KB and should be retrieved by the findAll(PartitionKey) method on a Spring Data Cosmos Repository providing the value of the partition key for the single item.
Code Snippet 1.
2.
3.
Expected behavior The response should get iterated over as normal and not throw an error when .next() is fired on the iterator. No idea why the startOperation() method in CosmosDiagnosticsContext is getting fired multiple times before completing the initial. The error is "IllegalStateException cause: Method 'startOperation' must not be called multiple times."
Screenshots If applicable, add screenshots to help explain your problem.
Setup (please complete the following information):
Additional context I could not reproduce this locally connecting to CosmosEmulator and doing load tests NOR could I reproduce this connecting to an actual cosmos DB locally and doing load tests. I could not connect to the actual cosmos DB due to networking and security restrictions. I also was using a Windows laptop locally but this is happening in a Linux environment in AKS. This also only seems to happen when there is a load test.
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report