Open steveryan opened 1 month ago
@steveryan Can you please share the docker logs from your CI pipeline. Also share the cmd used to start docker container.
@niteshvijay1995 Let me get you the logs, but while I dig into that, we are starting the container with the following
docker-compose up -d --remove-orphans cosmosdb
and that docker compose bit is as follows
version: "3.8"
services:
cosmosdb:
image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest
container_name: cosmos
tty: true
cpu_count: 2
environment:
- AZURE_COSMOS_EMULATOR_PARTITION_COUNT=3
- AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=true
ports:
- "8081:8081"
- "8900:8900"
- "8901:8901"
- "8979:8979"
- "10251:10251"
- "10252:10252"
- "10253:10253"
- "10254:10254"
networks:
- default
networks:
default:
ipam:
driver: default
config:
- subnet: "10.1.11.0/24"
Hello, We have the same issue with our CI pipelines as well (starting about 1 week ago). The behaviour is that we are now getting constant 408 errors in our API that is connecting to the cosmos db emulator. These errors happened sporadically but now they are happening on each run. We are running on gitlab CI, kubernetes runner.
We have isolated these issues to the new version of the docker image:
This might also be related to the cpu usage since while comparing job runs i can see spikes of 130% CPU usage on the container running the new docker image whereas peaks for the old image would be around 105%
We can use the cache image for a while longer, but it will expire in 6 days and we will be completely blocked.
@adrian-gheorghe can you please confirm if stable tagged version works for you?
Hi @niteshvijay1995 Thank you! I can confirm switching to the stable tag fixed the current issue. We still had some requests failing (we always get 503 and 408 errors randomly) but no longer failing consistently so at least the behaviour was as it was with the previous version.
I honestly did not know the stable tag existed and was using latest. There is no reference to other tags than latest that I could find in the documentation and the https://mcr.microsoft.com/ catalog does not include any reference of the emulator container image.
Hi @niteshvijay1995 Thank you! I can confirm switching to the stable tag fixed the current issue. We still had some requests failing (we always get 503 and 408 errors randomly) but no longer failing consistently so at least the behaviour was as it was with the previous version.
I honestly did not know the stable tag existed and was using latest. There is no reference to other tags than latest that I could find in the documentation and the https://mcr.microsoft.com/ catalog does not include any reference of the emulator container image.
Thanks for confirming @adrian-gheorghe. Can you please elaborate more on the issue that you faced in latest version. Is 408 returned from Emulator or from your application (due to high latency from emulator). If it is from emulator, could you please share the exact error message that is returned with 408?
Hi @niteshvijay1995 The 408 errors I mentioned are what our cosmos db client is getting while making requests to the emulator container. These happen for various request / query types (we have fixtures being removed and recreated between test runs in our CI)
Adding here an example error message:
Exception: Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: RequestTimeout (408); Substatus: 0; ActivityId: d5e72041-af43-4b0a-aac0-c2e5d21a8133; Reason: (GatewayStoreClient Request Timeout. Start Time UTC:10/04/2024 12:59:56; Total Duration:65004.2798 Ms; Request Timeout 65000 Ms; Http Client Timeout:65000 Ms; Activity id: d5e72041-af43-4b0a-aac0-c2e5d21a8133;);
Also, sorry I did not mean to hijack this thread, if this is unrelated to the original issue I can create a new issue
@adrian-gheorghe Yes, Please create a separate issue and share some repro steps. We will take a look into it.
Describe the bug Our CI has pulled the
:latest
image for the last ~18 months, and things have worked fine up to and includingsha256:bf9ddf53430701e6d954bdc8cd07ef672f0642a55244e2d4b0b478a633e89d27
.Images after that have left our CI waiting at
Waiting on explorer at https://localhost:8081/_explorer/index.html
indefinitely (45+ minutes).Expected behavior The
:latest
image should function the same as previous versions.Desktop (please complete the following information):
Docker Images Used: