Azure / azure-cosmos-db-emulator-docker

This repo serves as hub for managing issues, gathering feedback, and having discussions regarding the Cosmos DB Emulator Docker.
https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-develop-emulator?tabs=docker-linux%2Ccsharp&pivots=api-nosql
MIT License
159 stars 47 forks source link

Latest version not working in our CI environment #111

Open steveryan opened 1 month ago

steveryan commented 1 month ago

Describe the bug Our CI has pulled the :latest image for the last ~18 months, and things have worked fine up to and including sha256:bf9ddf53430701e6d954bdc8cd07ef672f0642a55244e2d4b0b478a633e89d27.

Images after that have left our CI waiting at Waiting on explorer at https://localhost:8081/_explorer/index.html indefinitely (45+ minutes).

Expected behavior The :latest image should function the same as previous versions.

Desktop (please complete the following information):

Docker Images Used:

niteshvijay1995 commented 1 month ago

@steveryan Can you please share the docker logs from your CI pipeline. Also share the cmd used to start docker container.

steveryan commented 1 month ago

@niteshvijay1995 Let me get you the logs, but while I dig into that, we are starting the container with the following

docker-compose up -d --remove-orphans cosmosdb

and that docker compose bit is as follows

version: "3.8"

services:
  cosmosdb:
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest
    container_name: cosmos
    tty: true
    cpu_count: 2
    environment:
      - AZURE_COSMOS_EMULATOR_PARTITION_COUNT=3
      - AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=true
    ports:
      - "8081:8081"
      - "8900:8900"
      - "8901:8901"
      - "8979:8979"
      - "10251:10251"
      - "10252:10252"
      - "10253:10253"
      - "10254:10254"
    networks:
      - default
networks:
  default:
    ipam:
      driver: default
      config:
        - subnet: "10.1.11.0/24"
adrian-gheorghe commented 1 month ago

Hello, We have the same issue with our CI pipelines as well (starting about 1 week ago). The behaviour is that we are now getting constant 408 errors in our API that is connecting to the cosmos db emulator. These errors happened sporadically but now they are happening on each run. We are running on gitlab CI, kubernetes runner.

We have isolated these issues to the new version of the docker image:

This might also be related to the cpu usage since while comparing job runs i can see spikes of 130% CPU usage on the container running the new docker image whereas peaks for the old image would be around 105%

We can use the cache image for a while longer, but it will expire in 6 days and we will be completely blocked.

niteshvijay1995 commented 1 month ago

@adrian-gheorghe can you please confirm if stable tagged version works for you?

adrian-gheorghe commented 1 month ago

Hi @niteshvijay1995 Thank you! I can confirm switching to the stable tag fixed the current issue. We still had some requests failing (we always get 503 and 408 errors randomly) but no longer failing consistently so at least the behaviour was as it was with the previous version.

I honestly did not know the stable tag existed and was using latest. There is no reference to other tags than latest that I could find in the documentation and the https://mcr.microsoft.com/ catalog does not include any reference of the emulator container image.

niteshvijay1995 commented 1 month ago

Hi @niteshvijay1995 Thank you! I can confirm switching to the stable tag fixed the current issue. We still had some requests failing (we always get 503 and 408 errors randomly) but no longer failing consistently so at least the behaviour was as it was with the previous version.

I honestly did not know the stable tag existed and was using latest. There is no reference to other tags than latest that I could find in the documentation and the https://mcr.microsoft.com/ catalog does not include any reference of the emulator container image.

Thanks for confirming @adrian-gheorghe. Can you please elaborate more on the issue that you faced in latest version. Is 408 returned from Emulator or from your application (due to high latency from emulator). If it is from emulator, could you please share the exact error message that is returned with 408?

adrian-gheorghe commented 1 month ago

Hi @niteshvijay1995 The 408 errors I mentioned are what our cosmos db client is getting while making requests to the emulator container. These happen for various request / query types (we have fixtures being removed and recreated between test runs in our CI)

Adding here an example error message: Exception: Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: RequestTimeout (408); Substatus: 0; ActivityId: d5e72041-af43-4b0a-aac0-c2e5d21a8133; Reason: (GatewayStoreClient Request Timeout. Start Time UTC:10/04/2024 12:59:56; Total Duration:65004.2798 Ms; Request Timeout 65000 Ms; Http Client Timeout:65000 Ms; Activity id: d5e72041-af43-4b0a-aac0-c2e5d21a8133;);

Also, sorry I did not mean to hijack this thread, if this is unrelated to the original issue I can create a new issue

niteshvijay1995 commented 1 month ago

@adrian-gheorghe Yes, Please create a separate issue and share some repro steps. We will take a look into it.