aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
http://aws.amazon.com/sdkfornet/
Apache License 2.0
2.05k stars 853 forks source link

'DynamoDbContext.LoadAsync' does not respect 'DynamoDBOperationConfig.IndexName' #1748

Open julealgon opened 3 years ago

julealgon commented 3 years ago

Description

When using the LoadAsync method on IDynamoDbContext to load an entity using a secondary global index, I'm getting unexpected behavior: an InvalidCastException is being thrown.

Switching the exact same call to a QueryAsync fixes the problem.

I believe LoadAsync is not respecting the IndexName property on the configuration object and is using the primary index for all searches. Either the API is incredibly misleading (as it allows you to pass this unused value) or there is a bug there.

Reproduction Steps

  1. Create a table with a primary and a secondary index
  2. Map the primary key to a Guid property
  3. Map the secondary key to a string property
    [DynamoDBTable("Somethings")]
    internal sealed class Something
    {
        [DynamoDBHashKey]
        public Guid? Id { get; set; }

        [DynamoDBGlobalSecondaryIndexHashKey]
        public string? Name { get; set; }
    }
  1. Try to use LoadAsync with a string key and pass new DynamoDBOperationConfig { IndexName = <Name_Of_Secondary_Index> } as options
await dynamoDBContext.LoadAsync<Something>("someName", new DynamoDBOperationConfig { IndexName = "Name-index" });
  1. An InvalidCastException is thrown
  2. Replace the LoadAsync call with QueryAsync, and it works as expected:
var query = dynamoDBContext.QueryAsync<Something>("someName", new DynamoDBOperationConfig { IndexName = "Name-index" });
await query.GetNextSetAsync();

Logs

System.InvalidCastException: Unable to cast object of type 'System.String' to type 'System.Guid'. at Amazon.DynamoDBv2.Converter`1.TryTo(Object value, Primitive& p) at Amazon.DynamoDBv2.Converter.TryToEntry(Object value, DynamoDBEntry& entry) at Amazon.DynamoDBv2.Converter.ToEntry(Object value) at Amazon.DynamoDBv2.DynamoDBEntryConversion.ConvertToEntry(Type inputType, Object value) at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ToDynamoDBEntry(SimplePropertyStorage propertyStorage, Object value, DynamoDBFlatConfig flatConfig, Boolean canReturnScalarInsteadOfList) at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ToDynamoDBEntry(SimplePropertyStorage propertyStorage, Object value, DynamoDBFlatConfig flatConfig) at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ValueToDynamoDBEntry(PropertyStorage propertyStorage, Object value, DynamoDBFlatConfig flatConfig) at Amazon.DynamoDBv2.DataModel.DynamoDBContext.MakeKey(Object hashKey, Object rangeKey, ItemStorageConfig storageConfig, DynamoDBFlatConfig flatConfig) at Amazon.DynamoDBv2.DataModel.DynamoDBContext.LoadHelperAsync[T](Object hashKey, Object rangeKey, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken) at Amazon.DynamoDBv2.DataModel.DynamoDBContext.LoadAsync[T](Object hashKey, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)

Environment

Resolution

I belive LoadAsync should be changed to respect the index being passed as a parameter and use the object hashKey against that.


This is a :bug: bug-report

ashishdhingra commented 3 years ago

Hi @julealgon,

Good afternoon.

Please refer to the page .NET: Object Persistence Model -> Supported Data Types for the list of supported data types. Looks like Guid is not the supported data type. Refer to page Mapping Arbitrary Data with DynamoDB Using the AWS SDK for .NET Object Persistence Model for example on how support arbitrary data types. Basically it involves developing custom TypeCpnverter.

As a workaround, you may use string instead of Guid type for Id property. For using Guid type, modifying your code to below works fine:

using Amazon.DynamoDBv2;
using Amazon.DynamoDBv2.DataModel;
using Amazon.DynamoDBv2.DocumentModel;
using System;

namespace DynamoDbSecondaryIndexTest1748
{
    class Program
    {
        static void Main(string[] args)
        {
            AmazonDynamoDBClient amazonDynamoDBClient = new AmazonDynamoDBClient();
            DynamoDBContext dynamoDBContext = new DynamoDBContext(amazonDynamoDBClient);

            var loadResult = dynamoDBContext.LoadAsync<Something>("75f05566-2209-4f3d-ba5a-cfd9da5eab62", new DynamoDBOperationConfig { IndexName = "Name-index" }).GetAwaiter().GetResult();

            Console.WriteLine("Loaded Result: " + loadResult);
        }
    }

    [DynamoDBTable("TestTable")]
    internal sealed class Something
    {
        [DynamoDBHashKey]
        [DynamoDBProperty(typeof(GuidTypeConverter))]
        public Guid Id { get; set; }

        [DynamoDBGlobalSecondaryIndexHashKey]
        public string Name { get; set; }

        public override string ToString()
        {
            return $"[Id: {Id}, Name: {Name}]";
        }
    }

    public class GuidTypeConverter : IPropertyConverter
    {
        public DynamoDBEntry ToEntry(object value)
        {
            if(value == null || !(value is Guid)) throw new ArgumentOutOfRangeException();

            DynamoDBEntry entry = new Primitive
            {
                Value = ((Guid)value).ToString()
            };

            return entry;
        }

        public object FromEntry(DynamoDBEntry entry)
        {
            Primitive primitive = entry as Primitive;
            Guid guidValue;
            if (primitive == null || !(primitive.Value is String) || string.IsNullOrEmpty((string)primitive.Value) || !Guid.TryParse((string)primitive.Value, out guidValue))
                throw new ArgumentOutOfRangeException();

            return guidValue;
        }
    }
}

NOTE: The above code should be used only for reference. Kindly validate and test the GuidTypeConverter logic before trying to use the Guid type converter. We are not responsible for any data loss or production issues.

Please let me know if this helps and if this issue could be closed.

Thanks, Ashish

julealgon commented 3 years ago

@ashishdhingra I believe you might have misunderstood my request here. Please note that Guid is working fine when loading and saving entities. I have both the primary key, as well as another property on my model, mapped as Guid.

The problem here happens with the global secondary index, which has a string mapped property. Once I try to use LoadAsync passing a string value and the name of the secondary index, I get the exception above which is attempting to convert my string argument into Guid (which does not make sense since I'm not querying on the primary index).

I kindly request that you take another look at this. Even if the page you provided doesn't mention Guid as a supported type, I believe the issue here is not directly related to that.

julealgon commented 3 years ago

Also @ashishdhingra , I'd suggest to avoid providing example source code where you block on async code, as that is a bad practice that can lead to hard to debug problems in .Net. In your sample Main, you could just make Main itself async.

People not aware of the issues might end up naively copying those code sections into their code.

ashishdhingra commented 3 years ago

Hi @julealgon,

Good morning.

Thanks for your inputs. I used your code exactly where it tries to use the secondary index, to reproduce the issue. When not using the type converter, the QueryAsync call didn't throw any exception, but it also didn't returned any result. The LoadAsync call threw the type cast exception mentioned. After the use of type converter, the LoadAsync call (with the secondary index as 2nd parameter), worked properly. I will try to pinpoint the exact code in SDK which throws the error.

In the mean while, in order to get proper understanding of the issue, please share the following:

  1. Exact DynamoDB column configuration in AWS Console and the sample code which is working fine for you when loading and saving entities, without use of a type converter.

  2. Reconfirm what you meant by The problem here happens with the global secondary index, which has a string mapped property. Once I try to use LoadAsync passing a string value and the name of the secondary index,. I guess you are referring to the code below (which is what I used exactly):

dynamoDBContext.LoadAsync<Something>("75f05566-2209-4f3d-ba5a-cfd9da5eab62", new DynamoDBOperationConfig { IndexName = "Name-index" })

Regarding the example code where I'm blocking async call, that is just for reference purposes as mentioned in the note. I agree with your assertion that we should not block async calls in live production code.

Thanks, Ashish

julealgon commented 3 years ago

@ashishdhingra , can you clarify on this?

After the use of type converter, the LoadAsync call (with the secondary index as 2nd parameter), worked properly.

Do you mean you can use LoadAsync with the secondary index and actually return the data, or is just not returning anything? If it is not returning anything even though there is data with that value in the secondary index, then it confirms my suspicion that it is actually ignoring the IndexName setting and actually searching on the primary index instead.


In the mean while, in order to get proper understanding of the issue, please share the following:

  1. Exact DynamoDB column configuration in AWS Console and the sample code which is working fine for you when loading and saving entities, without use of a type converter.

Primary index:

Global Secondary index:

This configuration allows me to:

  1. Use LoadAsync with the primary key (.Net Guid): objects are fetched properly
  2. Use Save with my entity: entity is persisted properly, including Guid ID which I generate before the call
  3. Use QueryAsync().GetNextSetAsync().SingleOrDefault() with the secondary key + index name to fetch the entity using the secondary index

I cannot currently use LoadAsync to fecth from the secondary index.

  1. Reconfirm what you meant by The problem here happens with the global secondary index, which has a string mapped property. Once I try to use LoadAsync passing a string value and the name of the secondary index,. I guess you are referring to the code below (which is what I used exactly):

Affirmative. That call produces InvalidCastException which I believe is incorrect behavior: it should not be attempting to cast the string into a Guid since the secondary index property that is marked in my model is not a Guid, but a string property:

   ...
        [DynamoDBHashKey]
        public Guid Id { get; set; }

        [DynamoDBGlobalSecondaryIndexHashKey]
        public string? Name { get; set; }
   ...

nor should it require the creation of a converter from string to Guid as Guid works out of the box.

julealgon commented 3 years ago

I performed another test on my end which I figured could potentially fix the issue.

I realized that the DynamoDBGlobalSecondaryIndexHashKey attribute has an index name property, so I changed my model to:

        [DynamoDBHashKey]
        public Guid Id { get; set; }

        [DynamoDBGlobalSecondaryIndexHashKey("Name-index")]
        public string? Name { get; set; }

However, I still get the same error when attempting to use LoadAsync.

ashishdhingra commented 3 years ago

Hi @julealgon,

Upon further investigating and using the code below (NOTE: To keep things simple, I used .GetAwaiter().GetResult()):

using Amazon.DynamoDBv2;
using Amazon.DynamoDBv2.DataModel;
using Amazon.DynamoDBv2.DocumentModel;
using System;

namespace DynamoDbSecondaryIndexTest1748
{
    class Program
    {
        static void Main(string[] args)
        {
            AmazonDynamoDBClient amazonDynamoDBClient = new AmazonDynamoDBClient();
            DynamoDBContext dynamoDBContext = new DynamoDBContext(amazonDynamoDBClient);

            var loadResultSecondary = dynamoDBContext.LoadAsync<Something>("87a8f81a-9fe9-48cb-a7c0-a7a6788d91d3", new DynamoDBOperationConfig { IndexName = "Name-index" }).GetAwaiter().GetResult();
            Console.WriteLine("Loaded Result: " + loadResultSecondary);
        }
    }

    [DynamoDBTable("TestTable")]
    internal sealed class Something
    {
        [DynamoDBHashKey]
        public Guid? Id { get; set; }

        [DynamoDBGlobalSecondaryIndexHashKey]
        public string Name { get; set; }

        public override string ToString()
        {
            return $"[Id: {Id}, Name: {Name}]";
        }
    }
}

it gave the mentioned error, whether I used dynamoDBContext.LoadAsync<Something>("<some_guid_string_value>", new DynamoDBOperationConfig { IndexName = "Name-index" })or dynamoDBContext.LoadAsync<Something>("<some_guid_string_value>"). Using the below stack trace:

System.InvalidCastException
  HResult=0x80004002
  Message=Unable to cast object of type 'System.String' to type 'System.Guid'.
  Source=AWSSDK.DynamoDBv2
  StackTrace:
   at Amazon.DynamoDBv2.Converter`1.TryTo(Object value, Primitive& p)
   at Amazon.DynamoDBv2.Converter.TryToEntry(Object value, DynamoDBEntry& entry)
   at Amazon.DynamoDBv2.Converter.ToEntry(Object value)
   at Amazon.DynamoDBv2.DynamoDBEntryConversion.ConvertToEntry(Type inputType, Object value)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ToDynamoDBEntry(SimplePropertyStorage propertyStorage, Object value, DynamoDBFlatConfig flatConfig, Boolean canReturnScalarInsteadOfList)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ToDynamoDBEntry(SimplePropertyStorage propertyStorage, Object value, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ValueToDynamoDBEntry(PropertyStorage propertyStorage, Object value, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.MakeKey(Object hashKey, Object rangeKey, ItemStorageConfig storageConfig, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.LoadHelperAsync[T](Object hashKey, Object rangeKey, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.LoadAsync[T](Object hashKey, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
   at DynamoDbSecondaryIndexTest1748.Program.Main(String[] args) in C:\Users\ashdhin\source\repos\DynamoDbSecondaryIndexTest1748\DynamoDbSecondaryIndexTest1748\Program.cs:line 19

I was able to trace the call to TryTo. Basically when you pass string value, it tries to use (Guid)<string_value> to get the Guid value type. This call fails giving error Cannot convert type 'string' to 'System.Guid'. You may try to validate this by executing (Guid)"87a8f81a-9fe9-48cb-a7c0-a7a6788d91d3" in Visual Studio immediate window, you will get the same error.

The workaround here is to execute the call dynamoDBContext.LoadAsync<Something>(Guid.Parse("87a8f81a-9fe9-48cb-a7c0-a7a6788d91d3"), new DynamoDBOperationConfig { IndexName = "Name-index" }) instead. Notice the use of Guid.Parse(). This works successfully without using type converter.

Please let me know if this works for you and let me know if this issue could be closed.

Thanks, Ashish

julealgon commented 3 years ago

@ashishdhingra

Basically when you pass string value, it tries to use (Guid) to get the Guid value type.

Why though? It shouldn't be trying to do that, since the index I'm querying does not have a Guid as the hashkey: it is a string.

The workaround here is to execute the call dynamoDBContext.LoadAsync(Guid.Parse("87a8f81a-9fe9-48cb-a7c0-a7a6788d91d3"), new DynamoDBOperationConfig { IndexName = "Name-index" }) instead. Notice the use of Guid.Parse(). This works successfully without using type converter.

This is not a workaround, this is a different query altogether. Please note that the string Name property which is the hash key for the secondary index, is not a guid: it is actually textual data entered into the system by users, so it is mapped as string both on Dynamo as well as on my C# model class. This is contrary to the primary index hash key, which is a Guid and is mapped in Dynamo as string, and in code as Guid. The primary index Load is working fine, we are talking about the secondary index here.

I hope that clarifies the issue. Let me know if you need additional info.

ashishdhingra commented 3 years ago

Hi @julealgon,

The LoadAsync is part of DynamoDB Object Persistence model (similar to ORM frameworks for relational databases). For the mentioned problem, you have used Guild at C# object level instead of string at DynamoDb level. DynamoDB Object Persistence model can map arbitrary data types to DynamoDB properties via type converters. Also notice that first parameter of LoadAsync is of type object. Hence before executing the query, it tries to map the object type value to proper object (Something) property type (Guid) value. The cast (T)<some_value> is a valid call since DynamoDb out of box only supports simple types as mentioned earlier. Hope this gives some insights into 1st question. Kindly note that Object Persistence Model is designed to work with arbitrary types. Feel feel to explore SDK code to find more details of Object Persistence design.

For the 2nd question, DynamoDB SDK's LoadAsync method calls GetItem service API operation. If you refer Service API reference for GetItem, it expects Key request parameter which is a A map of attribute names to AttributeValue objects, representing the primary key of the item to retrieve.. Hence in your case, it should be the Guid object property. The .NET API reference for DynamoDBContext.LoadAsync also specifies the first parameter as Hash key element of the target item.. I'm not sure on how using 2nd parameter with secondary index name affects the GetItem service operation.

I'm not sure if for your use case (per your 2nd question), you intend to use Query operation.

Hope this helps.

Thanks, Ashish

julealgon commented 3 years ago

The LoadAsync is part of DynamoDB Object Persistence model (similar to ORM frameworks for relational databases). For the mentioned problem, you have used Guild at C# object level instead of string at DynamoDb level.

Incorrect. The problem I'm having is not with the primary key (which is mapped as Guid in C# code) but with the hash key for the secondary index, which is string on both ends.

DynamoDB Object Persistence model can map arbitrary data types to DynamoDB properties via type converters.

I understand that, however the Guid primary key is working fine for me without having to create any converters. I think you should probably update your documentation to clarify the behavior around Guids in the Object Persistence Model, since they appear supported even though they are not listed there.

Also notice that first parameter of LoadAsync is of type object. Hence before executing the query, it tries to map the object type value to proper object (Something) property type (Guid) value.

I understand the error, but I don't understand why it is attempting to perform that conversion in the first place. The type it should be attempting to cast to for the secondary index is string, not Guid: I'm not performing an operation on the primary index here when specifying IndexName.

For the 2nd question, DynamoDB SDK's LoadAsync method calls GetItem service API operation. If you refer Service API reference for GetItem, it expects Key request parameter which is a A map of attribute names to AttributeValue objects, representing the primary key of the item to retrieve.

Now this is where it gets incredibly misleading then. The LoadAsync API names the first parameter hashKey, not primaryKey. Following that nomenclature, one would understand it is "the hashkey for the index being searched", since:

If it truly is referring to the primary hashkey, then the API is incredibly misleading an needs work, in my opinion. Especially when directly compared to "sibling" APIs like QueryAsync, which takes the same parameters, but actually works on the index you specify and not only on the primary index.

I'm not sure if for your use case (per your 2nd question), you intend to use Query operation.

From the API surface, it would appear that LoadAsync should work on my scenario, but it doesn't. As I mentioned in the OP, I think either the API should be changed to support the scenario, or it should prohibit you from passing an index name into the method. The former is preferred in my view for API consistency (working similarly to QueryAsync).

ashishdhingra commented 3 years ago

Hi @julealgon,

With all above inputs I will work with developer to have a look at it, even though it might be a .NET API documentation update.

Kindly notice that the GetItem service API call returns single item, as opposed to Query service API call which returns multiple items. Hence GetItem should really work on primary key which is unique for all items. Please validate if this explains the behaviour of LoadAsync.

NOTE:

I will add bug label back to get the traction.

Thanks, Ashish

julealgon commented 3 years ago

...even though it might be a .NET API documentation update.

Again, while changing the documentation to make this behavior explicit is a valid "fix", I personally think it is a bad option in this case as the APIs will "appear" to behave the same but work differently at the end of the day, and this is bad for consumers. Obviously, this is up to you, just wanted to make my position clear here.

Kindly notice that the GetItem service API call returns single item, as opposed to Query service API call which returns multiple items. Hence GetItem should really work on primary key which is unique for all items. Please validate if this explains the behaviour of LoadAsync.

Perhaps we are confusing "primary/secondary" with "hash/sort" keys here? My understanding was that secondary indexes also had unique items under their respective hash keys, so the approach shouldn't change regardless if we are talking about single-result Load vs multi-result Query: using Load with a hash key from a secondary index should have similar behavior than performing the same call using the hash key for a primary index, i.e. it should always have "at most 1 result" in both cases.

Please do correct me if I'm wrong here as I just started interfacing with Dynamo.

API documentation for DynamoDBContext.Load (object) call which states Loads an object from DynamoDB for the given hash-and-range primary key and using the given config..

I never noticed the XML docs for the non-async call as I was always using the async version by default. This documentation does seem to suggest it only works on the primary index (having the primary key mentioned there), which would then result in a very misleading API as I said earlier since you can pass any other index name that is not the primary one as part of the call.

If the documentation is really correct here and this is not a bug, then I would strongly recommend removing the capability of passing the index name in the Load/LoadAsync calls. At least that will move developers to the pit of success faster.

...I will work with developer to have a look at it,... I will add bug label back to get the traction.

I appreciate your support on this Ashish.

normj commented 3 years ago

@julealgon Adding more context here. As @ashishdhingra mentioned LoadAsync is basically a wrapper around the underlying GetItem Service API. Basically it calls that and takes care of converting the low level service client response into the POCO. All of the operations on the DynamoDBContext have an override that take in the same DynamoDBOperationConfig type. I admit not my favorite design as it caused DynamoDBOperationConfig to be a union of all possible config settings. That means some settings like IndexName won't always be applicable. At this point changing this would be a serious breaking change which we are not ready to make.

I agree the parameter name on the LoadAsync would be clearer if it was called primary key. For background when this code was first written DynamoDB didn't have indexes yet and this just wasn't something we thought about going back and renaming.

As for the idea of making LoadAsync work like QueryAsync except for a single object I think that could also cause serious confusion. A hash and range key for an index is not guaranteed to be unique in DynamoDB. So if we got multiple items back we would have to either throw an error which could be very unexpected or return back the first item which would be unpredictable.

At this point the best we can do is improve the documentation. Anything else causes serious breaking changes.

julealgon commented 3 years ago

Thank you @normj for the information.

At this point changing this would be a serious breaking change which we are not ready to make. At this point the best we can do is improve the documentation. Anything else causes serious breaking changes.

If you are not willing to change the behavior of the API, perhaps you could introduce new overloads, with a tighter config model (containing only the settings that make sense for any given operation) while retaining the old overload as-is, but marking it as deprecated. That would allow you to slowly phase it out and replace with the newer, more robust one.

Another potential option would be to actually throw an exception at runtime when an invalid setting is passed into the call. I'd rather get an exception that tells me why that setting cannot be used, than get "wrong results" as is happening right now.

Anyways, very unfortunate to have such a misleading API stuck in place. I'll probably need a significant comment block in my code to explain this anomaly and justify using the Query syntax in my repository implementation. I hope you can look back at this at some point and rethink the strategy (maybe implement an EF provider instead of a custom model, like CosmosDB does and some paid providers provide). Clearly, the "wheel reimplementation" here created unnecessary problems for consumers.

nkoudelia commented 2 years ago

Just spent a day banging my head into the wall trying to LoadAsync() from a GSI having only a hash (but no sort) key which means there must be exactly one result matching the key. Having to use QueryAsync().GetRemainingAsync().Single() in such case is not very intuitive.

github-actions[bot] commented 1 year ago

We have noticed this issue has not received attention in 1 year. We will close this issue for now. If you think this is in error, please feel free to comment and reopen the issue.

julealgon commented 1 year ago

We have noticed this issue has not received attention in 1 year. We will close this issue for now. If you think this is in error, please feel free to comment and reopen the issue.

From my perspective, this is still valid, even if its a documentation-only change.

ashishdhingra commented 1 year ago

Needs review on how we could implement this documentation change.

devnull commented 4 months ago

Wow, I just spent hours trying to work out what I was doing wrong! This really needs to be included in the documentation if it's not going to be fixed. It would've saved me so much headache!

ashovlin commented 3 months ago

For the short term, we've clarified the Load/LoadAsync documentation starting in DynamoDBv2 v3.7.303.9 via #3320 to explain that 1) it calls GetItem under the hood, which relies on the primary key and 2) IndexName on the config does not influence which item is loaded.

Longer term, we would like to make the change @julealgon suggested above - move away from the "shared" DynamoDBOperationConfig towards operation-specific configs that only contain applicable settings for each. You may have noticed the v4-development branch, this is on our list of desired DynamoDB changes along with the interface/mockability issues.

ashovlin commented 1 month ago

We've merged #3421 into the v4-development branch, which includes a new set of operation-specific config objects LoadConfig, SaveConfig, QueryConfig, etc. This should help avoid confusion about IndexName and other properties that only apply to a subset of operations moving forward.

We've marked the methods that take DynamoDBOperationConfig as deprecated in V4, but are not planning to remove them entirely yet. We may do so in a future major version.

We're still working on getting preview packages published, you can track overall V4 progress in #3362. We'll leave this issue open until at least V4 hits GA.