aws-amplify / amplify-cli

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development.
Apache License 2.0
2.81k stars 821 forks source link

Question about the ID type #2925

Closed Ricardo1980 closed 4 years ago

Ricardo1980 commented 4 years ago

Hello,

I read that in GraphQL is better go have a global id, for caching reasons (https://graphql.org/learn/caching/). When using AWS Aurora, what type should I select to use that feature? Perhaps VARCHAR(64)? Or what type should I use so Amplify generates ID! instead of Int! ?

The only thing I found is this inside the resolver: $util.autoId() Here: https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-rds-resolvers.html But nothing about the type I have to use in AWS Aurora.

NOTE: Here https://docs.aws.amazon.com/appsync/latest/devguide/resolver-util-reference.html I found $util.autoId() : String Returns a 128-bit randomly generated UUID. Does Amplify detects that? But here: https://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/ I see 36 char instead of 16 (128/8) What should I do?

Thanks a lot for suggestions.

Which Category is your question related to? code generation

Amplify CLI Version 4.2.0

What AWS Services are you utilizing? AWS Aurora, AppSync, Amplify

rrrix commented 4 years ago

Hi @Ricardo1980,

I did some digging into the source code regarding your question, and it doesn't seem like the Amplify CLI supports using autogenerated GUID's (from $util.autoid()) in the AppSync resolvers for Aurora data-sources. I believe however, if you can setup your Aurora/MySQL tables to automatically generate String GUID's on new records being inserted, Amplify will support that. I haven't tested this myself, but just looking through the source it seems like it will work.

Regarding the "ideal" MySQL database type to use for GUID's at scale, it's actually BINARY(16). Here's a detailed article on why: http://mysql.rjweb.org/doc.php/uuid

Finally, as to why VARCHAR(36) instead of VARCHAR(16), is because Strings & Text are stored as Unicode/UTF8/UCS2 in MySQL, which are "double-byte" character sets (for example vs. ASCII, which is not). That means for every single character stored on disk is actually two bytes_ long. See mysql.com/..unicode-ucs2.html

Ricardo1980 commented 4 years ago

Hello @rrrix , thanks for your reply.

In your link I see: "most INSERTs involve a disk hit. Even on a beefy system, this limits you to a few hundred INSERTs per second." That is not good. Perhaps I am overcomplicating this thing. Maybe we should use UUID when working with DynamoDB (NoSQL), and autoincrementing integer ids when using Aurora (SQL). What do you think?

Last question, I see in the GraphQL schema definition (https://graphql.org/learn/schema/) this: "ID: The ID scalar type represents a unique identifier, often used to refetch an object or as the key for a cache. The ID type is serialized in the same way as a String; however, defining it as an ID signifies that it is not intended to be human‐readable." Does that mean I should not use ID! in the schema definition when I use auto increment integers as identifiers in my tables (because it is not a string)?

BTW, seems there is an error here: https://github.com/aws-samples/aws-appsync-rds-aurora-sample/blob/c1c444f7dafb9ec55e763fecbd2295e2169228e8/src/lamdaresolver/index.js#L55 and https://github.com/aws-samples/aws-appsync-rds-aurora-sample/blob/c1c444f7dafb9ec55e763fecbd2295e2169228e8/src/lamdaresolver/index.js#L42

Thanks a lot.

nikhname commented 4 years ago

Hi @Ricardo1980 , You should consider that the disk hits will happen only when the index reaches a size that cannot be cached. Using UUID with DynamoDB and auto-incrementing integer ID's with Aurora is a reasonable approach. Yes, you should use type Int in your schema since integers in Aurora are not parsed as ID in the transformer.

Ricardo1980 commented 4 years ago

Thanks @nikhname

I solved the ID issue. It is working fine. I am using BINARY(16) in the data base, and in the resolvers I use HEX and UNHEX to convert from ID to binary and binary to ID. I also added an extra line in the resolvers to remove the 4 hyphens Everything seems to work perfectly (I am developing the system, I didn't test the system on scale).

Here an example:

#set($id = $util.autoId().replace("-",""))
{
  "version": "2018-05-29",
  "statements": ["INSERT INTO UserChangeRequest (userChangeRequestId, creationDate) VALUES (UNHEX('$id'), CURDATE())", 
  "SELECT HEX(userChangeRequestId) as userChangeRequestId,  creationDate, FROM UserChangeRequest WHERE userChangeRequestId=UNHEX('$id')"]
}

(the alias is important, otherwise, it does not work)

One of the reasons I do this, perhaps I am wrong, is that it seems to me that integration with Cognito and security is easier. Let me explain. My data dase has a User table, with a lot of relationships with other tables. If the Cognito id and table user is exactly the same, everything seems much easier. For example, in my data base, a user can send a like to another user, which is just another table with 2 user ids (liker and liked). I can use the $ctx.identity.cognitoIdentityId as the user that sends the like in the database. Doing that, I make sure no user can send likes to other users on behalf of another one. I can also make sure the current user can modify only its profile, not other's people profile. Without that trick, imagine I use auto-increment ids in the user table, and in an extra field I save the cognito id. SELECTs and INSERTs are more complex and I also have to retrieve the cognito id with another SELECT all the time inside the resolvers. Am I right? How do you usually solve this problem? Thanks a lot for suggestions.

EDIT: I found another approach that allows having auto ids and in another field the cognito id, but I am not convinced. They are using pipeline resolvers to check all the time if the current user is admin or not. Do I have to do that every single request? I think that is not normal https://miro.medium.com/max/1416/1*ob6pvG-o8IDSVto1XdheIA.jpeg https://medium.com/@dabit3/intro-to-aws-appsync-pipeline-functions-3df87ceddac1

nikhname commented 4 years ago

@Ricardo1980 I don't see anything wrong with your approach. Using pipeline resolvers is a valid way to approach this as well. It is not bad practice to use the pipeline resolver to check if the user is admin on every request.

Ricardo1980 commented 4 years ago

Thanks a lot @nikhname

github-actions[bot] commented 3 years ago

This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs.

Looking for a help forum? We recommend joining the Amplify Community Discord server *-help channels for those types of questions.