Open houmark opened 5 years ago
I've been thinking about this one as its an area in our product development we are now getting to. I'd like to warm this up with a few options that hopefully might lead to something. Like a lot of others I'd just assumed that this would be trivial, a fault of my not understanding document databases well enough I guess. Its a bit of a trap to fall into when so far down the line of developing, try telling your client/manager that you need 5 days to get out the total records for something...its not a good look.
One issue with this whole thread seems to be the assumption that whatever the solution it has to cope with zillions of records when in fact a lot of apps might only be counting in the 100's or 1000's in some tables so I don't think there needs to be a one size fits all solution. I get that some of those solutions could cause some big issues if mis-used but with some knowledge of any constraints or suggested working parameters that should be safely navigated.
I'll start off with the smallest (and some might say naive) solution first and build up to the bigger ones. Remember, there isn't a one size fits all solution. My focus is on trying to make it easy to add this functionality with minimal hassle from the "developer building out the schema" point of view.
If you don't have many records (< 1000 or so) you might be able to get away with using the scannedCount
field as described here https://www.riccardosirigu.com/blog/aws-amplify-dynamodb-total-count-graphql-queries/ although it can "top out" when it hits its 1MB result set limit and so report a false total. Ideally it would be nice to know if there are more that could have been returned or if that is the total amount i.e. was there another page of results possible? That would at least give you the option of saying "record 1 of at least 246+" or something if we can't get an exact amount back.
Another option here is to do a full paged scan if I opt into it so I can get a "true total". This would loop through all the pages of results for a scan and return the total. I keep hearing that scans are expensive but again if I've only got a relatively small number of records that might be a hit I'm happy taking for now and just swallow the hit and push off doing it "properly for epic scale" in the future when it becomes a happy problem (ie we've grown so much we need to change it).
I can see this being a new directive that could be added to a model, something like:
type Posts @model {
title: String!
content: String!
author: String!
published: Boolean!
totalPosts: Int! @scannedCount
}
This new @scannedCount
directive would automatically do a scans of all the records in the Post table that could page through all the results until it finishes (thus masking from us the issues with the 1MB returned data limit). Used as shown above it would just give the total of all records for that table.
Now if we ignore the obvious performance traps that this directive sets up for us (we will discus those next) I think we can all agree that from a users point of view that looks like a simple and clean way of getting what we want, its a super quick way to get a total value out, problem solved from the users point of view initially. For fast prototyping or "good enough" features that might do me.
So now on to the problems this will cause or that need solving. Luckily now we have a directive we can add to it with some attributes to give us some finer grained control or indeed we could create multiple directives to give us the following functionality.
A quick note on Lambda functions, we are going to need some generating, would it be worth while if these are generated and stored in the /backoffice/functions/ folder as user generated functions are. That way should a user want to extend the function with additional logic they can. The functions this directive might generate should be all nicely stubbed out with pointers on where and how to add additional logic and would need pushing to deploy like any other user generated function. This directive shouldn't hide all the magic in the cloud and VTL templates, it should do the minimum to make the common case go a way and then give the user an extension point to add additional logic as their needs demand. This might mean that removing the directive from the schema won't delete the resources from the cloud automagically if you've edited them, I'd be happy with a warning about "Some functions appear to have been edited so you will need to manually delete the following or (A) delete all anyway".
If left unchecked this could get out of hand with the number of scanned pages growing over time so as a fail safe we should have a default number of pages that this will loop over before returning. The default can be agreed later but if the user wants to opt-in to setting it then they could in the directive like so:
totalPosts: Int! @scannedCount( maxPageDepth: 5 )
The maxPageDepth would be an optional param to set how many pages you are happy to do to get that total to stop it all grinding to a halt in the future (but hey we opted into this solution, its not ideal its pragmatic and we know the trade offs).
The trouble with the above is the same as the scannedCount solution from the blog post, it doesn't tell you if it reached the end of the pages or not, so you've now way of knowing if the reported number of results is the total amount or just the amount it got before hitting maxPageDepth. But it might be good enough for the right sized app and it sure is simple to do from a users point of view. As long as the limitations are documented they users should be able to make an informed choice.
Additionally maybe a trigger in CloudWatch could be created as part of using this directive to ping a notification when a field is taking too long to rescan? A nice bit of Developer Experience and another way to introduce a new feature to those of us still learning our way around what AWS has to offer.
I want something other that the total count, I want a count of some filter or other. This ones a bit trickier but could we allow filters in the directive?
totalPosts: Int! @scannedCount
totalPublishedPosts: Int! @scannedCount( filter: "published=true") #something like that anyway
That would work but its not dynamic, I can't pass in a filter at runtime but it would be good for various counters/stats that could be handy to have. I wonder if we could generate a GraphQL query that would allow you to pass a filter through to this directive/resolver (the lines become blurred as to which this now is at this point, guessing its a resolver under the hood but then I don't know enough to know what a resolver does under the hood as yet so my terms might be wrong here, feel free to enlighten me).
Sort keys could be passed through too maybe to give you a total for a subset in a similar manner, its not my strong point so this part of the ideas is hazy on purpose but here to get you thinking.
Another issue that @houmark raises repeatedly is that the number of results that match a filter can only be worked out if we scan all the records and also it doesn't work if we have a limit. This @scannedCount
would do both I think in its own brute force way.
Do we really want or need to look this total up every time? If its a small number of records then why not, it wouldn't be all that slow nor costly (again we are talking about relatively small number of records here not eBay products or Uber rides size apps) and we could cache it on the frontend if needed.
But if you want to opt in to caching we could do it one of two ways that could be set via the directive at the expense of another table and a function.
You could do this with a simple boolean or enum (to allow for more options in the future):
totalPosts: Int! @scannedCount( cached: true )
I'm not 100% sure of the internal here so I'm going to propose what I think it could do with my understanding of the options, there might be other better ways but this should give the gist. We need to store the value somewhere and then pull it back out.
One way would be to do this under the hood:
@scannedCount
directive has already created this table we can just re-use it? Along with storing the total would be also store some other fields like model name, target field, createdAt and updatedAt.This version uses a Lambda trigger function using DynamoDB Streams (one of the options when creating a function via the CLI so a known thing) that hooks into this models table updates and will rescan the table whenever a record is added/removed. This is a bit brute force but it would work.
The trouble with his method is it could cause needless thrashing of the tables when doing an import, we don't need that number generated on every entry if we are adding 100's or 1000's of records. I might be that you can live with the hit if the numbers are small but an option to disable the caching while doing an import but still trigger it to do a final count at the end would be good. I sadly don't know enough of how to do that nicely. You could add a "disabled" boolean attribute to the directive maybe but that would need another push to get it up to disable it and then how to get it to recalculate after enabling it again without just "touching" an existing record afterwards? That might be enough?
totalPosts: Int! @scannedCount( invalidOnCRUD: true ) # open to suggestions on any of these names btw
Another option is similar to the above except we don't do the rescan when the record changes we just null the value in the cache table. Then our resolver that gets the total for that field will do the rescan for us if the value is null. Depending on the data model and workflow of the app this might be a better fit.
My only worry here is that we separate out the two bits of cache work to two separate areas, the function to clear the cache and the resolver to build the cache.
An alternative if we are ok with the totals potentially not being 100% accurate at any given time is to have a timed Function do the rescan repeatedly after a set timespan. Again this could be set via the directive:
totalPosts: Int! @scannedCount( refreshCountInSeconds: 300 ) # regenerate every 5 minutes
I don't know the best practise for passing timespans around so that is up for debate but hopefully you get the idea.
How we get that cached value out into that model though I'm hazy on, I guess we just have a resolver generated which would go get the value from the "cache table" but better minds than mine will know how best to do that.
Rather than having these fields on the model themselves do we need a PostTotals
model generated instead that can store all these top line numbers on and possibly pull these from the cache table via a resolver or when generated by amplify codegen
etc? In that case does the directive move up to the whole table rather than the field level?
type Posts
@model
@scannedCount( fieldName: "totalPosts" ) {
title: String!
content: String!
author: String!
published: Boolean!
}
What if we out grow the @scannedCount directive above, what options do we have? Scanning is too slow due to the number of records we've got. This might be where a @runningTotal
directive might be the answer.
It has been suggested above that you can keep a rolling count of various totals by hooking into the "DynamoDB stream" which will fire an event every time you add/update/delete a record in DynamoDB. We could then use these to increment or decrease a totals counter.
This solution though has some instant problems:
I'd like to see a directive that could wire all this up for me from the schema, this would spin up the functions needed (stored locally for editing) and store the values in a separate table similar to the cache table mentioned earlier if not the same table but different records so we don't clash? Lambda trigger functions hooked into DynamoDB streams would do the adding/subtracting. As these are built to run all the time we won't have the same potential scaling/performance problems we had with the @scannedCount
directive.
type Posts
@model
@runningTotal( fieldName: "totalPosts" ) {
title: String!
content: String!
author: String!
published: Boolean!
}
Now what about that idea to have multiple counts on a table? Could we do this:
type Posts
@model
@runningTotal( fieldName: "totalPosts" )
@runningTotal( fieldName: "totalPublishedPosts", filter: "published=true" ) {
title: String!
content: String!
author: String!
published: Boolean!
}
I don't see why not. Need more power? They you should be able to go edit the function code or copy it and now you've got a great starting point to adding in a few more lines to get you more counters as needed. Easy of development continues and I think we are still using all the tech that Amplify already has and uses so not a massive stretch of knowledge or comfort area?
Getting into wish list land with this last one, suggest the above gets done before this one gets looked at but...
Final option is something similar to what @RossWilliams suggested above that of "going large" and streaming the table data from dynamo into a Relational database so you can do full on SQL magic on that. This one currently is beyond my pay grade, I laughed when I first read Ross' comment and it sounded crazy complicated but after a bit of reading I can see how it all might stitch together...but again...should someone have to do all that to get some stats or counters out of their data? The answer might simply be "you are using the wrong database in the first place!" and I should go with a relational database from the off...but that isn't how Amplify is marketed or sold to me!
The basic idea is that by hooking into a DynamoDB Stream we can push record changes to a relational database and then from there we can pull out whatever data SQL will allow us to do. Sounds excessive but if its what we've got then lets use it. Could another directive handle this one?
type Posts
@model
@streamToSQL { # Guess this would be RedShift?
title: String!
content: String!
author: String!
published: Boolean!
}
This would push any CRUD operations into a relational database, I guess you would need to add this database first as a resource via the CLI amplify add sql
. It would create a table name the same as the model (could allow you to over-write this in the directive).
What to do about relations and @connections
though. In theory it could include those and I guess that would be the dream.
This does raise the idea that do we need a @sql
resolver similar to the @http
resolver so we could quickly get the results of a sql query into our GraphQL? I'll leave that one for another issue I think ;)
@houmark raises the tricky point about authentication earlier and to be honest I don't know. I'm not all that sure if this is an issue with passing @auth
to elastic search or just within the api. I'm hoping its an elastic search thing which would mean that the above solutions should work with the possible exception of the relational database idea, I can see that needing a lot more thought.
It should be possible to apply @auth
to the model or the individual fields like normal which should cover it for the @scannedCount
and @runningCount
directives. Alternatively if we use a separate table like PostTotals
a discussed above then @auth
could be applied maybe to that? That should re-use the logic already in place I think?
Sorry this is so long, it wasn't meant to be, its a tough problem and I've been giving it a lot of thought as I hope you can tell. I'm hampered though by only knowing some of what is needed to know the best route but hopefully this adds to the conversation and brings a users point of view to how I'd personally like it to roughly work and what constraints I might be happy living with to get some of the functionality I and others say we need.
I've tried really hard to leverage what Amplify already has or uses so that it shouldn't be a massive stretch of the imagination or skillset to get it to do what it needs to.
Lets keep the conversation going, what have the Amplify devs come up with in the time this has been open, where might it go and how can we get there (and ideally an idea of when)?
tl;dr; - You should create a Lambda resolver for this field and share your code for others to start from. I don't see an automated solutions fitting in well to the project without significant effort and problems to overcome
@PeteDuncanson you are an excellent writer and I thoroughly enjoyed reading this. My thoughts and responses are a lot more scattered, but let me try to explain my crazy ideas and somewhat address your proposals. I agree the proposals I have made in previous comments are not ideal, but still believe they are the best we have given the path Amplify has taken.
At an even higher level, imagine a developer (e.g. me, or a start-up – this is the target audience of AWS Amplify, right?) who creates an app which is basically a social network. There people can follow each other, create posts and like and write comments for those posts. This is pretty basic scenario/logic. And for this kind of apps displaying follower, like and comment count is quite crucial. Of course, at the launch the numbers are not so big and fall under 1Mb, but then with growth – the more users, the bigger those numbers are. The developer would like to choose AWS Amplify because of the infrastructure, ease, integrity and velocity mainly which are so needed at the early stages of new projects (Free Tier is also much appreciated). But suddenly he has a dilemma, why doesn't such a robust tool support such a basic feature? Should he fight these limitations (and then maintaining them) or just search for a tool which just supports that or at least where it could be done without hacking?
@ilyagru this scenario is easily solved with a pipeline resolver or dynamodb stream. After adding a like, update the like count on the post. Where it gets tricky is paginated (even more difficult with filtererd) results
Does anyone on the AWS team even read this? So far I am missing an official statement on this topic. We have currently spent a lot of money on development and we are still faced with such a trivial problem.
I think the Amplify team only has two options. Either solve the problem or find an alternative to DynamoDB. It's not ok that nothing has happened on this subject for 13 months. Not even a responsible person has been assigned to this issue.
@Amplify Team, @aws-amplify-ops Please give us a statement.
@andre-scheffer (and others) this is good feedback and myself and @kaustavghosh06 have responded on this thread several times. We are both members of the Amplify team. The problem is non-trivial as others have noted in their descriptions above of potential implementations, all which we are looking into in the future which was noted earlier. As such we'll keep this thread open to continue to measure interest as well as potential thoughts on solutions such as those outlined by @PeteDuncanson, @duwerq , and @RossWilliams. Having further open dialogue here on requirements, DX options, and suggestions for implementation are welcome.
@undefobj, thank you for your answer.
The problem is non-trivial
I definitely believe you, as far as your part is concerned.
As such we'll keep this thread open to continue to measure interest as well as potential thoughts on solutions such as those outlined by @PeteDuncanson, @duwerq , and @RossWilliams.
and suggestions for implementation are welcome.
Did I get something wrong and Amplify is an open source project? We as paying customers of AWS should only supply the requirements. Sorry if that sounds arrogant, But this in an Amazon project and not any little community project.
So I have to ask again. What is your timeline on this issue? What is the Roadmap? Why is nobody assigned to this issue in GitHub?
Did I get something wrong and Amplify is an open source project? We as paying customers of AWS should only supply the requirements. Sorry if that sounds arrogant, But this in an Amazon project and not any little community project.
It is open source but it is also quite a small team. Each team in Amazon is different and run their own way. Lets be grateful we have a team on this who are engaging and open to ideas and not tar them with the "big evil rich company" brush.
Yes at the end of the day this tech exists to get us to sign up and spend money with them but then so are the apps we build with it. In the meantime they are trying to build the best stack they can but there are pressures to doing that that need some consideration no doubt.
For my part I'm slowly getting up to speed on the problem some more and seeing if I can't come up with a @runningCount
directive of my own, its a steep learning curve though. Hoping to get this up and running so I can dog food it a bit then offer it up here for discussion.
It is open source but it is also quite a small team
But someone at AWS needs to be responsible for this project? This someone needs to address issues like that. I don't understand why such fundamental problems have existed for 16 months. It's not good to see that there is such a long feedback loop without a final result like: 'Yes, we will fix that' or 'No, we won't implement it'.
open to ideas and not tar them with the "big evil rich company" brush
I fully agree, but it's still AWS with tons of resources. But yeah I stop complaining about it and try to find another solution like using an extra mirror database to do the math.
Hoping to get this up and running so I can dog food it a bit then offer it up here for discussion.
That would be awesome.
Found aws-amplify/amplify-cli#2901 referenced in the docs (https://docs.amplify.aws/lib/graphqlapi/query-data/q/platform/js#frequently-asked-questions) so thought I'd link them together for anyone following the breadcrumbs
The thing I don’t understand here is if you don’t like dynamodb or it’s lack of counts, you have an option to use another database. Nader is about to put a video using a Postgres DB. I highly recommend it for anyone not interested in dynamodb https://twitter.com/dabit3/status/1328710735619268608?s=21
+1 @duwerq, there are definitely other things i'd rather have prioritized on their roadmap over this. There are plenty of other good db options.
@RossWilliams could you give an example of how to do a table stream lambda to implement the count. I am very new to amplify but really need this feature. Thanks
@snahammond Here are the steps. Send me an email, LAST_NAME.FIRST_NAME@gmail.com if you get stuck.
At a high level, you are creating a lambda, connecting a lambda to the table change stream, adding permissions to allow the lambda to connect to the stream, and writing a lambda to handle create, update, and delete events. You are also going to create environment variables so you have the table names you need inside the lambda.
Create a new lambda function. You can use the CLI to create it. Have the function depend on your graphQL API. This means your backend-config.json for this lambda should have in its "dependsOn"
{
"category": "api",
"resourceName": "XXX_APP_NAME",
"attributes": ["GraphQLAPIIdOutput", "GraphQLAPIEndpointOutput"]
},
Open the function's audit-cloudformation-template.json file, add an item to "Resources" to attach the lambda to the Dynamo Table:
"DynamoTableXXX_TABLE_NAMEStream": {
"Type": "AWS::Lambda::EventSourceMapping",
"Properties": {
"BatchSize": 20,
"MaximumBatchingWindowInSeconds": 100,
"BisectBatchOnFunctionError": true,
"StartingPosition": "LATEST",
"Enabled": true,
"EventSourceArn": {
"Fn::ImportValue": {
"Fn::Join": [
":",
[
{
"Ref": "apiXXX_APP_NAMEGraphQLAPIIdOutput"
},
"GetAtt",
"XXX_TABLENAME",
"StreamArn"
]
]
}
},
"FunctionName": {
"Fn::GetAtt": [
"LambdaFunction",
"Arn"
]
}
},
"DependsOn": [
"LambdaExecutionRole"
]
},
Your lambda function policy needs the following statement for permissions. you will need to add permissions for any additional operations you use. Keep the existing permissions for log groups that Amplify generates:
{
"Effect": "Allow",
"Action": [
"dynamodb:DescribeStream",
"dynamodb:GetRecords",
"dynamodb:GetShardIterator",
"dynamodb:ListStreams"
],
"Resource": {
"Fn::Sub": [
"arn:aws:dynamodb:${region}:${account}:table/XXX_TABLENAME-${apiId}",
{
"region": {
"Ref": "AWS::Region"
},
"account": {
"Ref": "AWS::AccountId"
},
"apiId": {
"Ref": "apiXXX_APP_NAMEGraphQLAPIIdOutput"
}
}
]
}
}
You can adjust the batch size and batch window I provided above.
Add an environment variable for any tables you need to have access to inside the Lambda:
"LambdaFunction": {
"Type": "AWS::Lambda::Function",
"OtherItems": "existing items here",
"Properties": {
"OtherItems": "existing items here",
"Environment": {
"Variables": {
"ENV": {
"Ref": "env"
},
"REGION": {
"Ref": "AWS::Region"
},
"XXX_DYNAMO_TABLE_FOR_UPDATES": {
"Fn::Join": [
"-",
[
"XXX_DYNAMO_TABLE_NAME",
{
"Ref": "apiXXX_APP_IDGraphQLAPIIdOutput"
},
{
"Ref": "env"
}
]
]
},
}
Now you have a lambda that can connect to the table stream, here is a very minimum lambda function (forgive any mistakes, i'm writing in github) to read the stream information and update an item (this requires adding "dynamodb:UpdateItem" to the permissions above:
import { DynamoDBStreamHandler, DynamoDBStreamEvent, DynamoDBRecord } from "aws-lambda";
import DynamoDB, { AttributeMap, DocumentClient } from "aws-sdk/clients/dynamodb";
const documentClient = new DocumentClient(); const TABLE_NAME = process.env.XXX_DYNAMO_TABLE_FOR_UPDATES;
export const handler: DynamoDBStreamHandler = async (event: DynamoDBStreamEvent) => { for (const record of event.Records) { if (record.dynamodb.NewImage && !record.dynamodb.OldImage) { const newItem = DynamoDB.Converter.unmarshall(record.dynamodb.NewImage); if (newItem.XXX_INTERSTING_FIELD) { await documentClient.updateItem({ TableName: TABLE_NAME, Key: { table_pk_attribute: XXX_A_PARTITION_KEY, table_sk_attribute_if_exists: XXX_AN_OPTIONAL_SORT_KEY }, UpdateExpression: "SET #xxx_my_counter_item = :xxx_my_counter_increase", ExpressionAttributeNames: { "#xxx_my_counter_item": "xxx_my_counter_item" }, ExpressionAttributeValues: { ":xxx_my_counter_increase": newItem.XXX_INTERSTING_FIELD }, }).promise(); // TODO: handle failures } } else if (record.dynamodb.NewImage && record.dynamodb.OldImage) { // old and new item, meaning record has been updated. check if you need to increase a count or not } else { // no new item, meaning record has been deleted. you may need to decrement a count } } }
My usecase for this is to show the total number of records on that particular day
My current idiomatic way of dealing with the limitation is to show a +
icon beside the number if nextToken
exists.
Just adding another voice of support for this. Would love a way to reduce my Appsync bill 🗡️ ❤️
EDIT: It looks like someone has found an override that might be easy to implement:
https://www.riccardosirigu.com/blog/aws-amplify-dynamodb-total-count-graphql-queries/
Can someone from the Amplify team confirm this?
Just wanted to update on another method we've found of late. As usual it might not be suitable for everyone depending on how many records you have but if you enable DataStore (which by default can store/handle 10k records locally) then you can get the count of various queries, totals etc. by checking the total results returned when you query it:
// Gets ALL Todo's that we have stored in DataStore
const allTodos = await DataStore(Todo);
// Get all active Todos
const allActiveTodos = (await DataStore(Todo)).filter( todo => todo.status === "active" );
console.info( "You've got " + allActiveTodos.length + " active todo's out of " + allTodos.length + " todos total" );
It might be a bit wasteful, it might not be enough for what you want to do but for us it has made this problem go away for now and we can rework it when we have more time and if it becomes a problem. For now though its workable! Hope that helps someone.
You can enable DataStore via the CLI amplify update api
but becareful if you have existing data in there be aware about the need to back fill some of the new fields it adds aws-amplify/amplify-category-api#227
See https://docs.amplify.aws/lib/datastore/data-access/q/platform/js#query-data for more details.
Has anyone used TransactWriteItem
to keep track of counts? Say your app let people create blog posts and you want to keep track of posts per user. You create a custom resolver that use TransactWriteItem
that would 1. create a blog post and 2. add 1 count to user's total posts atomically, and this return success only if both actions pass through thanks to transaction's all or none design. This seems to be a better alternative to me than aggregating count in DynamoDB stream.
Has anyone used
TransactWriteItem
to keep track of counts? Say your app let people create blog posts and you want to keep track of posts per user. You create a custom resolver that useTransactWriteItem
that would 1. create a blog post and 2. add 1 count to user's total posts atomically, and this return success only if both actions pass through thanks to transaction's all or none design. This seems to be a better alternative to me than aggregating count in DynamoDB stream.
Hi @xitanggg - I am actually implementing the same way too. Some issues I encounter is:
Anyway, a snap of what I did in the request templates are
{
"version": "2018-05-29",
"operation": "TransactWriteItems",
"transactItems": [
{
"table": "Todo-jwr---ygm4-dev",
"operation": "UpdateItem",
"key": {
"id": {
"S": $util.toJson($context.args.input.todoID)
}
},
"update": {
"expression": "",
"
},
},
{
"table": "Comment-jwr---clygm4-dev",
"operation": "PutItem",
"key": #if( $modelObjectKey ) $util.toJson($modelObjectKey) #else { "id": $util.dynamodb.toDynamoDBJson($ctx.args.input.id) } #end,
"attributeValues": $util.dynamodb.toMapValuesJson($context.args.input),
"condition": $util.toJson($condition)
}
]
}
Do you have any suggested solutions?
Edit 2021-01-18
I just checked with the amplify-cli, and it seems TransactWriteItems
is not supported in cli yet....
I implemented TransactWriteItems
and it works very well for my purpose. Under CustomResource.json
, I basically created a customCreateBlogResolver
that is attached to a CustomDataSource
data source, which has a CustomDataSourceIAMRole
that gives it accesses to the different tables it needs.
In regards to your questions,
batch
)
#set($AppSyncApiId="xxxxx")
#set($currentEnv="dev")
{ ... "transactItems": [ { "table": "Todo-${AppSyncApiId}-${currentEnv}", ... }, { "table": "Comment-${AppSyncApiId}-${currentEnv}" ... } ] }
2. The document has a lot floating around, but it is actually decent, you just have to dig it. [This is the doc I found that has been particularly helpful.](https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html)
3. Yeah, I don't need to find a filtered set of result, so I have not much to share. The thing about designing for Dynamo is that you are designing for the common access patterns due to the key-value nature of no-sql. You need to spend some time thinking it before designing it. It is not as flexible query as sql.
4. When you say `cli`, you mean `mock api`? If so, yes, `transact` is not supported for the simulator, neither is `batch`. It is no big deal though, just push it in the cloud and you can test it in the `app sync console`. You can make edit of the resolvers templates on the console as well so you don't have to make changes locally and push to the cloud (just beware that sometimes there is latency before the template gets updated once you click save). Also, debugging in VTL can be painful because there is no `console.log`. I sometimes `util.json(xxx)` the result and set it into a graphQL return variable to see it, instead of using cloudwatch. Hope this helps.
@xitanggg - Have you taken a look at amplify/backend/tags.json
- I am wondering if that would serve as a place where you can declare some variables. I still am not familiar with the whole CloudFormation setup but would it be a direction to put ${currentEnv}
and ${AppSyncApiId}
for you?
As mentioned, I put it in the resolver template, so the .vtl
file (as shown below) (I also updated the code in my comment before to help illustrate). I haven't looked into tags.json
but it wouldn't be the place. Think of resolver template like a lambda function except it is written in vtl
. It is a piece of function that doesn't have the concept of variables outside of its scope. And unlike lambda function, I don't think app sync supports inserting environment variables through cloudformation yet.
.
└── amplify/
└── backend/
└── api/
└── xxxGraphQL/
└── resolvers/
└── Mutation.customCreateBlog.req.vtl (This file)
Similar to other languages, vtl
can also do import or include
, e.g. #include( "myVariables.txt" )
, so potentially it can be a centralized file to store variables and other vtl files just call include
. Yet unfortunately, App Sync doesn't seem to support it at the moment: https://github.com/aws/aws-appsync-community/issues/101. That left the option to include those variables in every .vtl
files you need.
I'd like to chime in with support for having some kind of support for this in Amplify. In my particular use case, I have a multi-tenant setup where I need to keep track of counts of variable numbers of things per tenant. My current solution took quite a bit of research to arrive at, here is what I am doing:
Stats Table - this is a dynamodb table for just storing accumulated statistics. There is one entry per tenant with an id that can be used to retrieve that record. New entries are created when new tenants are created and all the stats are initialized to 0.
I added a custom mutation (I found this somewhere, didn't create it and this was the hard part for me) for this table with the following transform which atomically increments or decrements one of the fields in the stats table for a specific record:
#set( $expNames = {} )
#set( $expValues = {} )
#set( $expSet = {} )
#set( $expAdd = {} )
#set( $expRemove = [] )
#set( $expNumbers = [] )
#if( $modelObjectKey )
#set( $keyFields = [] )
#foreach( $entry in $modelObjectKey.entrySet() )
$util.qr($keyFields.add("$entry.key"))
#end
#else
#set( $keyFields = ["id"] )
#end
#foreach( $entry in $util.map.copyAndRemoveAllKeys($context.args.input, $keyFields).entrySet() )
#if( !$util.isNull($dynamodbNameOverrideMap) && $dynamodbNameOverrideMap.containsKey("$entry.key") )
#set( $entryKeyAttributeName = $dynamodbNameOverrideMap.get("$entry.key") )
#else
#set( $entryKeyAttributeName = $entry.key )
#end
#if( $util.isNull($entry.value) )
#set( $discard = $expRemove.add("#$entryKeyAttributeName") )
$util.qr($expNames.put("#$entryKeyAttributeName", "$entry.key"))
#else
$util.qr($expSet.put("#$entryKeyAttributeName", ":$entryKeyAttributeName"))
$util.qr($expNames.put("#$entryKeyAttributeName", "$entry.key"))
#if ( $util.isNumber($entry.value) )
$util.qr($expNumbers.add("#$entryKeyAttributeName"))
#end
$util.qr($expValues.put(":$entryKeyAttributeName", $util.dynamodb.toDynamoDB($entry.value)))
#end
#end
#set( $expression = "" )
#if( !$expSet.isEmpty() )
#set( $expression = "SET" )
#foreach( $entry in $expSet.entrySet() )
#if ( $expNumbers.contains($entry.key) )
#set( $expression = "$expression $entry.key = $entry.key + $entry.value" )
#else
#set( $expression = "$expression $entry.key = $entry.value" )
#end
#if( $foreach.hasNext() )
#set( $expression = "$expression," )
#end
#end
#end
#if( !$expAdd.isEmpty() )
#set( $expression = "$expression ADD" )
#foreach( $entry in $expAdd.entrySet() )
#set( $expression = "$expression $entry.key $entry.value" )
#if( $foreach.hasNext() )
#set( $expression = "$expression," )
#end
#end
#end
#if( !$expRemove.isEmpty() )
#set( $expression = "$expression REMOVE" )
#foreach( $entry in $expRemove )
#set( $expression = "$expression $entry" )
#if( $foreach.hasNext() )
#set( $expression = "$expression," )
#end
#end
#end
#set( $update = {} )
$util.qr($update.put("expression", "$expression"))
#if( !$expNames.isEmpty() )
$util.qr($update.put("expressionNames", $expNames))
#end
#if( !$expValues.isEmpty() )
$util.qr($update.put("expressionValues", $expValues))
#end
{
"version": "2017-02-28",
"operation": "UpdateItem",
"key": #if( $modelObjectKey ) $util.toJson($modelObjectKey) #else {
"id": {
"S": $util.toJson($context.args.input.id)
}
} #end,
"update": $util.toJson($update),
"condition": $util.toJson($condition)
}
I then added a new function to be triggered by DynamoDB which checks for INSERT/MODIFY and set it up to watch the tables I was interested in tracking stats on. If in these tables there is a change in the field I am interested in (I have a status field that changes and I want to track how many items are in each status for each tenant) then I invoke the graphql mutation to update the statistics with the field I want to update and the amount to update it by (usually 1 or -1).
A side effect of this is that I can use a subscription on the stats table with a filter to the current tenant to get live updates for the counts of things I'm interested in.
My current limitation and problem is that I need to know ahead of time what stats I have to track so the schema knows about the columns to track. The new requirements for the platform are to have a tenant-configurable set of things that need tracking. I have no idea what I can do with vtl in this regard, I'm hoping I can store an AWSJSON field, parse it, update a key inside it, then serialize it back all atomically so I don't miss any updates, although I am not holding my breath on this :)
Any feedback on this approach would be appreciated.
I understand how complicated an integrated solution would be, I'm wondering if documented patterns would be enough to help people solve specific use cases.
Why doesn't someone just make a vtl that runs the list query to completion (including however many pages there are and whatever filters there are)...then have it just return a number...sure you have to scan the whole table, and it might be slow or costly etc...but add that information to the docs and let people have a way to get a total number of rows without having to return all the data....on the front end make a standard naming scheme and let it take the same params as the query:
ListXxxxxCount ListXxxxxQuery
What's about using aggregationQueries
?
Contributing this in case its helpful to anyone. I implemented totals for my solution some time back. I did it with ElasticSearch and it was quite easy. Early on I guess I discovered that there are a lot of things that are very difficult with DynamoDB alone, but with ElasticSearch there you can get to where you need to be without too much drama. I've created highly customized ElasticSearch resolvers that give me powerful searches so I'd recommend ElasticSearch be part of a standard landscape.
To start with, you need to modify the response like the example below. My response also includes geodistance (https://docs.amplify.aws/cli/graphql-transformer/resolvers#add-a-custom-geolocation-search-resolver-that-targets-an-elasticsearch-domain-created-by-searchable) to return geospatial queries. I also modified it to include _version when you are using conflict detection for DataStore.
#set( $items = [] )
#set( $nextToken = [] )
#set( $geodistance = [] )
#foreach( $entry in $context.result.hits.hits )
#if( !$foreach.hasNext )
$util.qr($nextToken.add($entry.get("sort")))
#end
#set( $row = $entry.get("_source") )
$util.qr($row.put("_version", $entry.get("_version")))
$util.qr($items.add($row))
$util.qr($geodistance.add($entry.get("sort")))
#end
$util.toJson({
"items": $items,
"total": $ctx.result.hits.total,
"nextToken": $nextToken,
"geodistance": $geodistance
})
Then for the searchable connection you need to modify the expected response to include total and geodistance
type Searchable<Model>Connection {
items: [<Model>]
nextToken: String
geodistance: [String]
total: Int
}
Then the GraphQL query is simply
query getTotal{
search<Model>s(filter:{
shortDescription:{eq: "MOTOR"}
})
{total}
}
Which returns the following response
{
"data": {
"search<Model>s": {
"total": 39
}
}
}
@sacrampton good feedback but your implementation seems to suffer from the same issue every other approach similar has. If you return 39 results from ES but the user doesn't have access to some of them due to access configurations in AppSync / DynamoDB, then your count will be wrong.
Hi @houmark - for the standard AppSync resolvers that is correct. However, in my customized resolvers I've implemented the security access in the request resolver to ensure that the query only gets access to the records you have access to and thus returns the correct numbers of records.
My application requires dynamic authorization. Each record has a group field and each user has access to certain groups. However, dynamic @auth seems to create all sorts of issues and is not supported in a lot of instances (subscriptions, etc.) - so I implement that in the resolver request to build my ElasticSearch query to only show the groups that we should have access to.
This is a different approach to the DynamoDB queries - there the security filtering happens in the response - here I do it in the request.
If I may add my 2 cents to this discussion:
There's an argument to be made that the difficulty in retrieving DyanmoDB record counts should be viewed primarily as an inherent/essential component of the design problem, rather than as an incidental/accidental component of the problem that can be 'hacked' away or abstracted away. (incidental = 'programmer's fault', essential = 'not programmer's fault').
The NoSQL database paradigm (of which DynamoDB is an instance) makes some things (like scaling horizontally) very easy, and some things (like counting/querying) more difficult.
Sure, from the perspective of the DynamoDB team, you could argue this is a problem worth attacking since it is very common and it does have viable (yet non-trivial) technical solutions (e.g. using hybrid NoSQL/relational approaches) -- but certainly from the perspective of Amplify users, for the foreseeable future this is just an inherent part of the design problem that must be contended with and solved for.
Luckily, there are solutions to many common instances of the counting problem in the NoSQL world, but they have to be explicitly designed for (e.g. counting likes/comments by incrementing/decrementing a counter on entry insertion/deletion, as has been suggested elsewhere in this thread).
I think what is fair to ask of the Amplify team -- especially since this functionality really is very basic in many real-world contexts -- is to make solving this problem as easy as possible given the fundamental design constraints, and remove those obstacles which are incidental.
For starters, the problem + common solutions should be documented in the official amplify docs. It's too important a problem to relegate to the 80-comment discussion thread on a 2 year old GitHub issue.
But more importantly, amplify should make implementing the viable solutions to this problem a lot easier.
For the most part, there are 2 bottom-up solutions to this problem, both already referenced in this thread:
As a bonus, these 2 capabilities would carry over to lots of other contexts that have nothing to do with counting.
An official short-term plan on this from the Amplify team would be much appreciated, given that the problem is crucial to many folks using Amplify for real-world (rather than demo-app) purposes, and that the underlying complexities are not likely to change any time soon.
And while I'm at this, to add my own ad-hoc documentation to this issue, if you would like to include the relevant counters as parameters on your standard @model GraphQL types (rather than in some dedicated stats
table), here's the approach I have arrived at:
It's using 2 3rd-party Amplify/AppSync directives:
@auto
: https://github.com/hirochachacha/graphql-auto-transformer@default
: https://github.com/trek10inc/graphql-default-value-transformertype Post
@model(subscriptions: null)
# ...
{
# ...
likes: [PostLike!]! @connection(keyName: "byPostLiked", fields: ["id"])
likesCount: Int! @auto @default(value: "0")
@auth(rules: [
# include rules to make sure this parameter cannot be changed by any public API
])
}
We want this feature, any update on this?, from AWS amplify developer teams. Thanks.
This is almost exactly like @RossWilliams example but I pulled it from a project I impelemented it on too and thought I'd share.
For anyone looking to add an increment/decrement count off a DB stream. First, create a lambda with the cli and name it something like updateCounts
. Then drop this into the lambda
const db = new aws.DynamoDB.DocumentClient();
const updateCounts = async ({ increase }) => {
const updateParams = {
TableName: process.env.TABLENAME,
Key: {
id: "PK of counts record",
},
UpdateExpression: `SET #counts = if_not_exists(#counts, :defaultCount) ${
increase ? "+" : "-"
} :inc`,
ExpressionAttributeNames: {
"#counts": "counts",
},
ExpressionAttributeValues: {
":inc": 1,
":defaultCount": 0,
},
};
try {
const Items = await db.update(updateParams).promise();
} catch (error) {
console.log("do some error handling")
}
};
exports.handler = async (event, context, callback) => {
console.log("event record", JSON.stringify({ event }));
await Promise.all(
event.Records.map(async (record) => {
if (record.eventName === "INSERT") {
const dbRecord = aws.DynamoDB.Converter.unmarshall(
record.dynamodb.NewImage
);
if (dbRecord.__typeName === "typeName of record that should update count record") {
updateCounts({ increase: true });
}
}
if (record.eventName === "REMOVE") {
const dbRecord = aws.DynamoDB.Converter.unmarshall(
record.dynamodb.OldImage
);
if (dbRecord.__typeName === "typeName of record that should update count record") {
updateCounts({ increase: false });
}
}
})
);
};
Then go to amplify/functions/updateCounts/updateCounts-cloudformation-template.json
and add the following
"Resources": {
....
"DynamoDBStreamAccess": {
"DependsOn": [
"LambdaExecutionRole"
],
"Type": "AWS::IAM::Policy",
"Properties": {
"PolicyName": "Lambda-DynamoDBStreamAccess-Policy",
"Roles": [
{
"Ref": "LambdaExecutionRole"
}
],
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:DescribeStream",
"dynamodb:GetRecords",
"dynamodb:GetShardIterator",
"dynamodb:ListStreams"
],
"Resource": [
arn:aws:dynamodb:<region>:<your-account-id>:table/<TableName>/stream/2020-10-18T22:08:08.367", // Insert your dynamodb stream arn (can be found in DynamoDB console)
]
}
]
}
}
},
"DynamodbstreamEventSourceMappingCounts": {
"Type": "AWS::Lambda::EventSourceMapping",
"DependsOn": "DynamoDBStreamAccess",
"Properties": {
"BatchSize": 10,
"EventSourceArn": "arn:aws:dynamodb:<region>:<your-account-id>:table/<TableName>/stream/2020-10-18T22:08:08.367", // Insert your dynamodb stream arn (same as above)
"FunctionName": {
"Fn::GetAtt": [
"LambdaFunction",
"Arn"
]
},
"StartingPosition": "TRIM_HORIZON",
"Enabled": "True"
}
},
}
Run amplify push and you should be good to go. You can test by adding a record in DynamoDB in the console. It will trigger the lambda, from there you can console.log inside the lambda and check cloudwatch to see any values you need
On a side note, make sure you've created the counts record in DDB so you can reference it in the updateCounts
function
Careful with lambdas that update the count. I currently do this, but you run into problems at scale, specifically: these lambdas that count are not idempotent, every time they run they increase or decrease the count (unless modified if that's part of your logic).
The problem is that lambdas are not guaranteed to run once if a record is updated. You could have, if a table has a bunch of updates on the same or different records, multiple batches of events sent to lambda with the same dynamo change. Meaning, you could increment the count too many times.
My approach to deal with this has been to add a LambdaLocks
table that I check to see if I have seen an event record with this ID before. If I have, I stop work on this record, otherwise, I update LambdaLocks
with the function name and record ID and then do my counting.
@zkwentz but if you rely on INSERT or REMOVE and not MODIFY, this shouldn't be a problem, correct?
@duwerq unfortunately no, even insert and removes can be sent multiple times in my experience. I think it has something to do with the event batching and when a table is under heavy load? I'd love to know better from the Amplify / Dynamo team if that is the case.
from the Docs:
"DynamoDB Streams helps ensure the following:
Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item. "
More here https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
I can't speak to your experience but it sounds like that shouldn't be possible
I can't recall specifically why this was happening but my understanding was that it was occurring on insert and remove.
With no ability to reproduce that same load that caused it now, I'll hold off on pushing this point further and defer to the docs, and add my own note which is to say, "ensure you don't have idempotency requirements when counting in a lambda."
The Lambda function reading from the stream needs to send a response back to the stream to acknowledge the batch was processed. If the function returns an error, the items in the batch will be processed a second time (and potentially by a new instance of the function). See this guide from AWS: https://aws.amazon.com/blogs/compute/optimizing-batch-processing-with-custom-checkpoints-in-aws-lambda/
While I appreciate all the alternative ways here, I don't see a solution without a lot of complexity that will allow returning counts for filtered queries (including when the query is sent to Elasticsearch). That is really what's needed in order to show a total result number even if only listing 20 items in a resultset for example. A global count is not really useful in many cases and attempting to pre-count the number of results expected for a certain filtered query is going to require quite a lot of work.
The AWS Amplify team has created an RFC in relation to @searchable
and in a comment I requested that counts would be included in those improvements. If you support that, please voice you opinion here: https://github.com/aws-amplify/amplify-cli/issues/7546#issuecomment-873368552
Hi Everyone - whilst I agree that totals should indeed be something out of the box - our solution is around ElasticSearch and I thought I'd share what we did because custom searchable resolvers have been really important for us to flexibly query and totals is just one benefit.
Firstly, we implement the security model in the resolver itself. For example we use dynamic auth, but do it in the resolver. In the resolver below the user is given a group in Cognito equal to the Plant ID. The SHOULD query at the bottom means that whatever people query will have an additional filter to exclude anything they don't have access to it. By including the security model in the query resolver it means the TOTAL we get from the query will be accurate and only show values that we are authorized to see.
But in the truncated resolver below you will see we include a huge number of query options - MUST MATCH, MUST MATCH KEYWORD, MUST MATCH QUERY STRING, MUST EXISTS - then we have must not versions of that, and "or" / "should" versions of all that. Also the ability to query ranges (ie. date/number range greater than, date/number range between values and not/or versions. Also geospatial queries (ie. find all assets within 100m of this current asset). Majority of these queries are not even remotely possible with standard DynamoDB and any counter built into a table in DynamoDB would not be able to dynamically get these values. Also use searchAfter to paginate really large data sets which from/size does not.
Once you have built the resolver the Totals are dynamically output and reflect really powerful / flexible queries that are not possible elsewhere. The current searchable resolvers are really bare/bones minimal capability - but our experience is that replacing most of the content of existing resolvers is required. It has been really worthwhile to build out that capability for our solution. Totals just happen to be a nice by product of powerful/flexible queries.
Just including this to share in more detail how we went about it.
## The resolver is proprietary to our data model, and is very large, so I have stripped out a lot of the content to give you an idea of the overall logic that we are using to build custom
## Get variables input from query and count everything that was submitted.
#set( $indexPath = "/asset/doc/_search" )
{ "version": "2017-02-28", "operation": "GET", "path": "$indexPath","params": {"body": {
"search_after": [
#set( $saloop = 0 )
#foreach( $userSearchAfter in $userSearchAfters )
"${userSearchAfter}"
#set( $saloop = $saloop + 1 )
#if( $saloop < $numberOfSearchAfters ) , #end
#end
],
#else
"from": #if( $context.args.from ) $from #else 0 #end,
#end
"size": #if( $context.args.limit ) $size #else 10 #end,
"sort": #if( $numberOfGeoFilters > 0 )
[{ "_geo_distance": {
"order": "asc",
"unit": "${unit}",
"mode": "min",
"distance_type": "arc",
"geolocation": $util.toJson($ctx.args.geoFilter.location)
}}],
#else
[
#if( $context.args.sort )
#foreach( $userSort in $userSorts )
#if( !$util.isNullOrEmpty($userSort.field) && !$util.isNullOrEmpty($userSort.direction) ) {
#if( $userSort.string == "keyword" ) "${userSort.field}.keyword"
#else "${userSort.field}"
#end : { "order": "$userSort.direction" } },
#end
#end
#end
{"id.keyword" : { "order": "asc" } }],
#end
"version": true,
"query":
## Enter MUSTMATCH Values
if( $numberOfMustMatch > 0 )
foreach( $userMustMatch in $userMustMatchs )
,{ "match" : { "${userMustMatch.field}": "${userMustMatch.query}"}}
end
end
## Enter MUSTMATCHKEYWORD Values
## Enter MUSTEXISTS Values
## Enter RANGESINGLE Values
## Enter RANGEDUAL Values
## Enter MUSTQUERYSTRING Values
## --------------------------- OR QUERIES
## Enter ORMUSTMATCH Values
## Enter ORMUSTMATCHKEYWORD Values
## Enter ORMUSTEXISTS Values
## Enter ORRANGESINGLE Values
## Enter ORRANGEDUAL Values
## Enter ORMUSTQUERYSTRING Values
## --------------------------- MUST NOT MATCH ------------
## Enter MUSTNOTMATCH Values
## Enter MUSTNOTMATCHKEYWORD Values
## Enter MUSTNOTEXISTS Values
## Enter MUSTNOTQUERYSTRING Values
## ---------OR MUST NOT QUERIES -------------------------------------------
## Enter ORMUSTNOTMATCH Values
## Enter ORMUSTNOTMATCHKEYWORD Values
## Enter ORMUSTNOTEXISTS Values
## Enter ORMUSTNOTQUERYSTRING Values
## ---------------------------------------------------
## Enter GEOFILTER values
#if( $numberOfGeoFilters > 0 )
,{"bool": {"filter": {
"geo_distance": {
"distance": "${distance}${unit}",
"geolocation": $util.toJson($ctx.args.geoFilter.location)
}}}}
#end
## Enter SHOULD values for GROUPS
#if( $XXXXGroup == false )
#if( $numberOfGroups > 0 )
,{"bool": { "minimum_should_match": 1, "should" :
#end
#if( $numberOfGroups == 1 )
{"match": {"assetPlantId.keyword": "${userGroup}"}}
#end
#if( $numberOfGroups > 1 ) [
#set ( $i = 0 )
#foreach( $userGroup in $userGroups )
#if( $i > 0 ) , #end
#set ( $i = $i + 1 )
{"match": {"assetPlantId.keyword": "${userGroup}"}}
#end ]
#end }}
#end
#end
]
}
}
}
}
}
@sacrampton Now that looks like something useful and also something that has the potential to make it into the Amplify core. Thanks a lot for sharing!
Amplify can support this automatically adding a count field whenever any list is defined. Amplify can internally update it whenever any element is inserted / deleted.
It's been a long discussion here, but why is this feature not officially there yet?
So any amplify user never implements this or GoToLast
button?
It should be an official feature.
If the total count
is a heavy load, how about calculateTotalCount: Boolean
as an option with the default value false
?
this is very frustrating. a lot of our clients complain about using the next button multiple times to reach their page. I cannot implement the goto page without knowing the page number. any update on this?
Since there is obviously no real support from DynamoDB here, which is fine, you should generally take a different approach. We have the @search directive, what if there is also the @sql or @athena directive from Amplify?
I myself currently have the setting: DynamoDB is searched 2 times a day by a crawler (AWS GLUE) and the new entries are stored in a SQL capable copy under S3 in parquet format.
So for tables I can get the additional information like count. Or "Jump to Page" would also work.
The current problem I am facing is Search + pagination
user can search by title_start_with
+ age_between
+ status
Now I have to make my mode @searchable to jump specific pages, Plus I have to calculate the total count with a different query. Which is the whole implementation of DB search from scratch. What if a user can search by combination of 5 fields.
If I have to manually implement this, there is no merit in using @model
. which is the almost whole reason I chose amplify over appSync + CDK + serverless mysql
If you have records that are in the 15,000 or less range, a potential solution would be to grab the first set of results and do a "pagination" query on a separate GSI for the whole set of records
Won't that Query for the whole set of records be expensive?
No, it doesn't have to be. In the new GSI you'll only use the KEYS_ONLY
Attribute projections, ie PK/SK, which will only store the PK/SK.
Then do 1 query for all the results. 1 record would only be about 75 bytes and 1 query can return up to 1MB.
1MB / 70 bytes is about 14-15K records. Now you have a whole set of all the keys with indexes and when you want to query a specific page, just grab the PK/SK of the page index from your "pagination" query
@naseer036 could this work for you?
The current problem I am facing is
Search + pagination
user can search by
title_start_with
+age_between
+status
Now I have to make my mode @searchable to jump specific pages, Plus I have to calculate the total count with a different query. Which is the whole implementation of DB search from scratch. What if a user can search by combination of 5 fields.
If I have to manually implement this, there is no merit in using
@model
. which is the almost whole reason I chose amplify overappSync + CDK + serverless mysql
I could be mistaken but I thought Elasticsearch returns hits.total
for any query. Couldn't you use that? This is an old resolver template from Amplify early days:
#if (! $util.isNull($context.args.paging))
#set($Integer = 0)
#set($from = $util.defaultIfNull($Integer.parseInt($context.args.paging.nextToken), 0))
#else
#set($from = 0)
#end
#set($result = { "items": [] })
#foreach($entry in $context.result.hits.hits)
$util.qr($result.items.add($entry.get("_source")))
#end
#set($last = $from + $result.items.size())
#if($context.result.hits.total > $last)
$util.qr($result.put("nextToken", $last))
#end
$util.qr($result.put("totalCount", $context.result.hits.total))
$util.toJson($result)
Yes @duwerq - it does and that's what I've been using for a couple of years - works brilliantly. But as I said before, you need to implement your security (auth) model in the request resolver to ensure your response resolver is giving accurate totals.
Yes @duwerq - it does and that's what I've been using for a couple of years - works brilliantly. But as I said before, you need to implement your security (auth) model in the request resolver to ensure your response resolver is giving accurate totals.
Okay so you’re saying if you rely on the auth to filter records in the VTL request, it will give a an innaccurate total. Are you suggesting any auth params also be passed into the must match of Elasticseearch?
@sacrampton
Is your feature request related to a problem? Please describe. I think it's way past due that Amplify supports total counts in GraphQL queries. Other GraphQL based platforms has this built in by simply adding
totalCount
(or similar) in the query and no matter the limit, they'll get back the total count in addition to the (filtered) data.Describe the solution you'd like This should at least work for DynamoDB backed models and of course also for search based queries that pass by ElasticSearch.
Describe alternatives you've considered Making a Lambda function that is a field in each model using the
@function
directive, but since we are both usinglistItems
andsearchItems
with filters added, the implementation is not simple as we have to reapply those filters on the lambda function for getting the correct count.Making custom resolvers seems like another not very fun route and not very scaleable or maintainable, and once again, this should be a "out of the box one liner" to have available as a developer. With either a Lambda or some other custom resolver I'm looking at hours or days of development.
Additional context This is a must have feature and there's not really any workaround for displaying total counts for systems with many items — at least that I know of. I read several bug reports, but none of them seems to have a simple solution. That it has not yet been developer by AWS is beyond my understanding, as pulling out counts is one of the most common things to do when developing web apps.