Open HeneryHawk opened 9 months ago
Hi, currently the emulator is marked UP as soon as all the partitions are created. The Service (Gateway) which serves the REST APIs and explorer, sometime takes longer to boot, and this could cause the problem you face.
We will try to improve this behavior in next releases. For now please use following script for checking Emulator startup - check_emulator_startup.sh
Hi @v1k1, thanks for your reply. We didn't notice this behavior before, only since the end of January we have noticed this. Furthermore, the container starts very slowly, which makes the implementation of tests very time-consuming, as each execution of the tests takes several minutes. Most of the time is taken up by waiting for the container to start successfully. This is very annoying. Please address this issue so that the container is available again more quickly.
We're seeing the same as @HeneryHawk. We are using the CosmosDB Emulator in our integration tests, sometime early this year we noticed that the runtime of our integration tests had been significantly increased. Further investigation shows that the added runtime for our tests is that the CosmosDB Emulator now takes a lot longer to get ready (i.e. the Gateway is ready to accept requests).
I've always thought that the CosmosDB Emulator was a bit slow to start (up to 30 seconds), but now I'm seeing startup time of about 1m45s on my dev machine for CosmosDB Emulator, so something has changed somewhere to cause this. We are running the CosmosDB Emulator in docker using mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
with 4 partitions.
We're seeing the same as @HeneryHawk. We are using the CosmosDB Emulator in our integration tests, sometime early this year we noticed that the runtime of our integration tests had been significantly increased. Further investigation shows that the added runtime for our tests is that the CosmosDB Emulator now takes a lot longer to get ready (i.e. the Gateway is ready to accept requests).
I've always thought that the CosmosDB Emulator was a bit slow to start (up to 30 seconds), but now I'm seeing startup time of about 1m45s on my dev machine for CosmosDB Emulator, so something has changed somewhere to cause this. We are running the CosmosDB Emulator in docker using
mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
with 4 partitions.
I am seeing the same thing. Start up has increased significantly recently.
The issue that I am facing is that randomly slows down. This is really time consuming when testing. Please fix this behavior!
We also face rather long stratup times. I wonder if there is a way to increase the log level that we at least can see that it is still doing something.
Troubleshooting is hardly possible. I'm trying to startup the emulator for some end-to-end testing in a K8S cluster and cosmosdb is the only big pain point (TLS/SSL only needing special ingress treatment, not even started even after 15mins, no logs about the real status, constant CPU load...).
@v1k1 @niteshvijay1995 Can you look at this ?
Not sure if my problem aligns with the problems of others, but to share some insights:
I am deploying the a pod via this file (I also tried keeping port 8081 - the custom port is just an attempt to see if it changes something).
@v1k1 @niteshvijay1995 Can you look at this ?
Is there an ETA?
It's really slow to be ready ... after the container shows started
it still takes a while.
I wrote this little bash oneliner to ensure Cosmos DB has started completely.
echo -n "Waiting for Cosmos DB emulator ..."; while true; do result=$(curl -s -k https://localhost:8081 | jq -r '.code'); if [ "$result" = "Unauthorized" ]; then break; fi; echo -n "."; sleep 2; done; echo " ready."
And it produces this output when run with time
command:
Waiting for Cosmos DB emulator ........................................................ ready.
real 1m47.743s
user 0m0.941s
sys 0m0.098s
And that's on a i7 12700k, the database is basically empty. (2 datasets).
@aratz-lasa ETA would be 2 minutes then
I experience the same really slow startup. It is a real pain as sometimes I have to wait 5 to 10 minutes until I can debug one line of code. Reducing the partition count to 1 did improve the time to start a bit but not really as much as I hoped.
Current setup:
I'm using Aspire 8.0.2 and I have a small application that uses CosmosDB for persistence. I want to run IntegrationTests using the DistributedApplicationTestingBuilder. I currently have to poll the "/alive" endpoint and added AzureCosmosDB-HealthCheck that has the "live"-Tag, so the application is only considered alive / running when the cosmos emulator instance is accessible.
On the other hand the Azurite-Instance that I have to run as well is up almost immediately.
using Projects;
var builder = DistributedApplication.CreateBuilder(args);
var azureStorage = builder.AddAzureStorage("storage")
.RunAsEmulator(r =>
r.WithArgs("azurite", "-l", "/data", "--blobHost", "0.0.0.0", "--queueHost", "0.0.0.0", "--tableHost", "0.0.0.0", "--skipApiVersionCheck")
);
// ## Synchronization
// Aspire currently only supports Azure Storage for Orleans.
var synchronizationClusterStore = azureStorage.AddTables("synchronization-cluster-table");
var synchronizationStateStore = azureStorage.AddBlobs("synchronization-grains-state");
var synchronizationOrleans = builder.AddOrleans("synchronization-cluster")
.WithClustering(synchronizationClusterStore)
.WithGrainStorage(synchronizationStateStore)
;
var synchronizationInboxDatabase = builder.AddAzureCosmosDB("SynchronizationInboxDatabase")
.RunAsEmulator(r => r
.WithEnvironment("AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE", "127.0.0.1")
.WithEnvironment("AZURE_COSMOS_EMULATOR_PARTITION_COUNT", "1")
);
builder.AddProject<Synchronization_Silo>("synchronization")
.WithReference(synchronizationOrleans)
.WithReference(synchronizationInboxDatabase)
.WithExternalHttpEndpoints()
.WithReplicas(1)
.WithEnvironment("ASPNETCORE_ENVIRONMENT", builder.Environment.EnvironmentName)
.WithEnvironment("DOTNET_ENVIRONMENT", builder.Environment.EnvironmentName)
;
using var app = builder.Build();
await app.RunAsync();
Regarding the very slow startup, we have implemented a workaround that pings /_explorer/emulator.pem
until this URL is accessible.
This has now worked well for a while, but further errors have been occurring for a few weeks now. For inexplicable reasons, our Gradle tests lose the connection to the emulator during execution in the pipeline. So the tests fail and so does the pipeline.
ERROR c.a.c.i.GlobalEndpointManager - Fail to reach global gateway [https://172.17.0.2:8081/]
ERROR c.a.c.i.GlobalEndpointManager - startRefreshLocationTimerAsync() - Unable to refresh database account from any location.
We have not yet found a workaround for this and the Gradle tests regularly break in the pipeline. It sometimes takes 5 - 10 executions until the test job is successful.
Is there an update for this issue? Or is there an update on the way? Cosmos DB is the flagship database on Azure and is advertised by Microsoft. It is a great pity that the associated emulator really does not work well. I would expect an emulator that is regularly maintained, starts quickly, works without major problems and fulfills expectations. Unfortunately, this is not currently the case. The instability of the Linux emulator is causing us more and more problems, our development is slowing down and we are losing more and more money as a result. Please Microsoft, do something about it and satisfy the expectations of the developers. That should also be your own standard.
@sajeetharan @v1k1
We also experience same problem now: on Ubuntu 20 host cosmosDB emulator prints Started
in console, but would never be able to serve the certificate for Java to connect to it.
However, in my case (I am executing tests in emulator on pretty beefy VMs using Ubuntu 20 as OS) problem was in disabled IPv6. AFter enabling it - emulator started pretty fast
I'm experiencing the same issue where the CosmosDB Emulator takes about 2m45s on Azure Dev Ops Windows images, even though it is pre-installed. This is a lot of time that slows down testing.
I am running CosmosDb emulator as docker container for executing integration tests... but the emulator takes so long to start (even after container is up)... all tests are failing..... is there any workaround?
@isaranghi The problem is that cosmosdb is not ready even after the container is started. To get around this I poll a known address on it, until I get a http 200 reply. I'm using testcontainers to start the cosmosdb docker image. It's kinda slow, but it gets the job done.
Here is the code i use to start the container:
public class DatabaseHandler : IAsyncLifetime
{
private readonly CosmosDbContainer _dbContainer = new CosmosDbBuilder()
.WithImage("mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest")
.Build();
public CosmosClient? CosmosClient;
public async Task InitializeAsync()
{
await _dbContainer.StartAsync();
var connStr = _dbContainer.GetConnectionString();
await CosmosHelper.PollForDockerImageReadyness(connStr);
CosmosClient = new CosmosClient($"{connStr};DisableServerCertificateValidation=True;", new CosmosClientOptions
{
ConnectionMode = ConnectionMode.Gateway,
HttpClientFactory = () => _dbContainer.HttpClient
});
}
public async Task DisposeAsync()
{
await _dbContainer.StopAsync();
}
}
Here is the code used to poll for cosmos readyness:
public class CosmosHelper
{
......
/// <summary>
/// Fix for error in docker image for cosmosdb(Hopefully fixed soon). It says it's ready before the container is ready.
/// Will poll until we get a http 200 response for a known correct url
/// </summary>
/// <param name="connectionstring"></param>
/// <returns></returns>
public static async Task PollForDockerImageReadyness(string connectionstring)
{
const int delayInMs = 2000;
const int maxAttempts = 30;
var currAttempt = 0;
var splitArr = connectionstring.Split(';', '=');
var parsedHostAndPort = splitArr[1];
var pathToTest = $"{parsedHostAndPort}/_explorer/emulator.pem";
using var client = new HttpClient(new HttpClientHandler { ServerCertificateCustomValidationCallback = (_, _, _, _) => true });
do
{
currAttempt++;
if (currAttempt > maxAttempts)
throw new InvalidOperationException("Cannot connect to cosmosdb");
await Task.Delay(delayInMs);
try
{
await client.GetByteArrayAsync(pathToTest);
return;
}
catch (Exception)
{
// Ignored
}
} while (true);
}
}
@chrisflem You don't need to do that manual check with TestContainers, they've already been doing that check for you since version 3.8.
(You also don't need to specify the Docker image manually at the top, since that's what they use anyway.)
Hi I have the problem that often the Linux emulator container is not finished booting, although the console says
Started
and all partitions have been created according to the console. It doesn't happen every time the container is started, but at least 50% of the time.I could observe the following: When the container is started and all partitions are created immediately, the container is immediately ready and, for example, the Explorer can be accessed. However, if the creation of the last partition is not completed immediately, the container is not immediately ready, even though
Started
is printed to the console. It then takes at least 30 seconds or more for the Explorer to be accessible, for example.The behavior could be observed on several dev notebooks, as well as in our GitLab CI.
We noticed the behavior in connection with Java Testcontainers and the official Azure module contained therein. Our integration tests start a Cosmos emulator container via it. In the official Azure module, the Cosmos container is implemented in such a way that it waits for the
Started
log to appear in the console before the tests are executed or the emulator certificate is downloaded and added to the Java Keystore. Unfortunately, this leads to errors ifStarted
is printed on the console but the container cannot yet be called and therefore the certificate cannot be downloaded and no connection from the Cosmos Java SDK to the emulator can be established. Since the behavior has been noticed both when using the Java Testconainers Framework and when the container is started via Docker CLI, I assume that this is not related to Testcontainers. And since our integration tests are executed in the GitLab CI, this leads to countless faulty pipelines.Is this a known error? Is there a way to get more logs? What could be the reason for this?
Screenshots fast startup and container is ready immediately
slow startup and container is not immediately ready
Desktop (please complete the following information):
Docker Images Used:
Arguments && Environment variables to start Docker:
docker run -p 8081:8081 -p 10250-10255:10250-10255 -it --rm -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=3 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator:latest
Docker Environment