Azure / azure-cosmos-dotnet-v2

Contains samples and utilities relating to the Azure Cosmos DB .NET SDK
MIT License
577 stars 838 forks source link

ResponseContinuation throws expcetion when select query with distinct #720

Open dasari8192 opened 5 years ago

dasari8192 commented 5 years ago

Describe the bug Accessing ResponseContinuation throws following exception when there is a 'distinct' in select query

" Distict query requires a matching order by in order to return a continuation token. If you would like to serve this query through continuation tokens, then please rewrite the query in the form 'SELECT DISTINCT VALUE c.blah FROM c ORDER BY c.blah' and please make sure that there is a range index on 'c.blah'. "

To Reproduce I have a mulit-partition collection 'products' and would like to run query "SELECT distinct value c from c where c.type=\'electronic\' order by c.orderDate desc" I have 10 products in my collection and I am giving my pagesize as 5

DocumentDB client version: "Microsoft.Azure.DocumentDB" Version="2.4.1" "Microsoft.Azure.DocumentDB.Core" Version="2.4.1"

Following is console application:

using Microsoft.Azure.Documents.Client; using System; using System.Linq; using BettingEdge.BettingEngine.Services.Fixture.Common.Models.DBO; using Microsoft.Azure.Documents; using Microsoft.Azure.Documents.Linq; using Newtonsoft.Json;

namespace DocumentDBClientDistinctIssue { public class Product { [JsonProperty(NullValueHandling = NullValueHandling.Ignore, PropertyName = "name")] public string Name { get; set; }

    [JsonProperty(NullValueHandling = NullValueHandling.Ignore, PropertyName = "orderDate")]
    public DateTime OrderDate { get; set; }

    [JsonProperty(NullValueHandling = NullValueHandling.Ignore, PropertyName = "type")]
    public string Type { get; set; }
}

class Program
{

    private static DocumentClient DocumentClient = null;
    private static Uri DocumentCollectionUri = null;
    private const string EndpointUrl = "https://localhost:8081";
    private const string AuthKey = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
    private const string DatabaseId = "distinct_test";
    private const string CollectionId = "products";
    private const string PartitionKey = "type";
    private const int PageSize = 5;

    static void Main(string[] args)
    {

        try
        {
            InitDocumentDb(CollectionId);

            var query = "SELECT distinct value c from c where c.type=\'electronic\' order by c.orderDate desc";

             GetProducts(query);

        }
        catch (Exception e)
        {
            Console.WriteLine(e);
            throw;
        }

    }   

    private static void GetProducts(string sqlQuery)
    {
        var feedOptions = new FeedOptions { MaxItemCount = PageSize, EnableCrossPartitionQuery = true};
        var documentQuery = DocumentClient.CreateDocumentQuery<Product>(DocumentCollectionUri, sqlQuery, feedOptions).AsDocumentQuery();
        var feedresponse    = documentQuery.ExecuteNextAsync<Product>().GetAwaiter().GetResult();
        var products = feedresponse.ToList();
        var continuationToken = feedresponse.ResponseContinuation; // throws expception here
    }

    private static void InitDocumentDb(string collectionId = null)
    {
        var collId = collectionId ?? CollectionId;
        DocumentClient = new DocumentClient(new Uri(EndpointUrl), AuthKey);
        DocumentCollectionUri = UriFactory.CreateDocumentCollectionUri(DatabaseId, collId);
    }

}

}

Expected behavior Distinct product list with valid continuation token.

Actual behavior Products returned in feedResponse but fetching 'ResponseContinuation' throws exception with message

" Distict query requires a matching order by in order to return a continuation token. If you would like to serve this query through continuation tokens, then please rewrite the query in the form 'SELECT DISTINCT VALUE c.blah FROM c ORDER BY c.blah' and please make sure that there is a range index on 'c.blah'. "

Environment summary

Used .netcore 2.1 console application

Additional context

If we remove the distinct in query then results are returned as expected with valid continuationToken

var query = "SELECT value c from c where c.type=\'electronic\' order by c.orderDate desc";

j82w commented 5 years ago

@dasari8192 continuation tokens are not supported for distinct queries without an order by. The reason it requires an order by is it only requires the continuation token to have the last document id stored in the continuation token so it can tell where the last result stopped. Without an order by it would require every single document id that was returned to be stored in the continuation token which causes the continuation token to be to large.

dasari8192 commented 5 years ago

@j82w If I run the same query in Data explorer or throuh REST api (using POSTMAN) the query gets executed without any error and received products with continuation token. Also I can continue with this continuation token.

j82w commented 5 years ago

Cosmos db only guarantees distinct items within a request. Its possible for the same item to be returned in both the first request and the 2nd request. The SDK keeps a list of items in memory and removes the duplicates if Cosmos DB returns any.

dasari8192 commented 5 years ago

@j82w Thanks for the reply. I have two questions here. 1) Why the behavior for same query execution via DocumentClient, DataExpolorer/REST(api) is different? (I expect same query produces same result irrespective of where we execute the query) 2) You mentioned "Its possible for the same item to be returned in both the first request and the 2nd request.", this means distinct is applied only in one page and not against the whole collection. Do you suggest any idea how can I prove this.

Thanks in advance

j82w commented 5 years ago
  1. You can see the source code for query in v3 SDK. The client handles a lot of the query logic. In this case the SDK was built to handle the duplicates and ensures the user gets a distinct list. I would strongly recommend using the SDK as it has built in retry logic and direct mode for better performance.

  2. I don't have a repo of this right now. The problem is you can insert and remove items between each page read. The Cosmos query engine is stateless and relies on the continuation token for the state. If an item that had a distinct value was removed and reinserted within a new object it's possible to get that distinct value again.

Nikolay-Uvarov commented 4 years ago

We have the same issue in the powerapps. We are using COSMOS DB Query documents V3(V2) action in the power automate(flow).

wgrzesiak147 commented 3 years ago

Hey, any news on this one?. We face the same problem

SELECT DISTINCT c.id FROM Products c join ident in c.identifiers ORDER BY c.id DESC;

Throws the same exception. The same query on cosmos DB explorer works fine

ssmsexe commented 3 years ago

@wgrzesiak147 have you tried to build the same query with subqueries? I think it'd probably solve your problem.

wgrzesiak147 commented 3 years ago

@wgrzesiak147 have you tried to build the same query with subqueries? I think it'd probably solve your problem.

Yes, I tried, it works but it's much slower.