Azure / azure-cosmos-dotnet-v2

Contains samples and utilities relating to the Azure Cosmos DB .NET SDK
MIT License
577 stars 837 forks source link

cosmos DB issue with Service Fabric in multi-node mode #616

Open washraf opened 6 years ago

washraf commented 6 years ago

Describe the bug A clear and concise description of what the bug is. I am building a multi tenant Application using cosmosDB and partitioning the collections according to the tenant id. I am using service fabric statless services to access the data. Before every request I create a new user (if it not there) and get the token. (A also tried to create a user and get the token and delete the data after the operation). I Use the following code to get the token.

user = new User
{ Id = config.PartitionKey };
user = await client.CreateUserAsync(UriFactory.CreateDatabaseUri(config.Database), user);
var permission = await GetorCreatePermission(user, config.Collection, config.PartitionKey);
return permission.Token;

The scenario I am using is that I grab the data via two requests. The first request grabs the data in the collection, the second one does some computation and gets the other data (both from the same collection).

When I am using Service fabric in 1 Node configuration it works perfectly. if I switched to the 5 Node configuration only the second request gives an error. the accessed service is configured as a singleton in all cases and I am using the same code and the error happens in local and Azure deployments.

The error is:

Insufficient permissions provided in the authorization header for the corresponding request. Please retry with another authorization header.

To Reproduce Steps to reproduce the behavior. If you can include code snippets or links to repositories containing a repro of the issue that can helps us in detecting the scenario it would speed up the resolution. I am using SQL API. I have a collection names project and each project have modules inside it. I execute two queries - first one selects all projects tenant and the second one selects the count of modules per project

Expected behavior A clear and concise description of what you expected to happen. The query results is returned. Actual behavior Provide a description of the actual behavior observed. The second request gives an error. Environment summary SDK Version: 2.0.0 OS Version (e.g. Windows) Service fabric SDK v3.2.176

Additional context Add any other context about the problem here (for example, complete stack traces or logs).

christopheranderson commented 6 years ago

Unrelated to your issue but just to double check, before every request, you're grabbing a new resource token? Just a heads up, that won't scale very well. Getting a new resource token uses a limited budget, so it won't scale horizontally. You need to cache the resource tokens to avoid getting throttled.

I often find people don't use resource tokens right. It's useful for untrustworthy subsystems. So you want to hand a scoped permission token to another client. If you're creating a new resource token + new client for each request, and it's all in the same process (so master key is in memory as well), then you're not any safer and you've just added TONS of overhead. If you're handing the token off to another system and its caching it, then your design could make sense if you need to enforce least privilege access to that system (but you can't let that system choose which token it gets when it asks for it/etc.)

Re your issue:

Can you please provide your two queries and their request options? Based on my interpretation of your Repro steps, it sounds like the second query should ALWAYS fail unless you only have a single partition key in your DB (a partition key scoped Resource token cannot do an aggregate query across all partitions). But I could just be misunderstanding it. Sample schema + queries can make that more clear.