aws / aws-appsync-community

The AWS AppSync community
https://aws.amazon.com/appsync
Apache License 2.0
503 stars 32 forks source link

Feature Request: new RDS util for converting ctx.args to comma separated list of SQL update values #60

Open ffxsam opened 4 years ago

ffxsam commented 4 years ago

I have an update mutation resolver with the following as its request template:

#set ($updates = "")

#if ($ctx.args.name)
    #set ($updates = "$updates, name = '$ctx.args.name'")
#end

#if ($ctx.args.duration)
    #set ($updates = "$updates, duration = $ctx.args.duration")
#end

{
    "version": "2018-05-29",
    "statements": [
        "UPDATE tracks SET $updates.substring(1) WHERE id = $ctx.args.id AND owner_id = '$ctx.identity.username' RETURNING *"
    ]
}

This works fine, but is a little clunky, and could really get out of hand if I were updating 10+ columns. It would be nice to have a built-in util to convert specific args into a list of column/value pairs for using with SQL's UPDATE, e.g.:

#set ($updates = $util.rds.argsToValuePairs($ctx.args, ["name", "duration"]))

{
    "version": "2018-05-29",
    "statements": [
        "UPDATE tracks SET $updates WHERE id = $ctx.args.id AND owner_id = '$ctx.identity.username' RETURNING *"
    ]
}
buggy commented 4 years ago

I know AWS has an example like this in the RDS Resolver Mapping Template Reference but doesn't this open the door to SQL injection attacks? It feels like we should be strongly discouraging this syntax in favour of using statements + variableMap.

ffxsam commented 4 years ago

I don't think that's anything to be concerned about since the client side is using GraphQL as its query language, it's not passing SQL statements to the back-end. Also, note that my hypothetical function lets you specify the fields to pull from $ctx.args.

AaronHarris commented 4 years ago

Thanks for the request. We're looking at improving the experience around making update SQL requests more straightforward. One possible way, at least to make the VTL side easier, is to use a stored procedure in your database, which then means all you need to do is pass in the specific arguments and let your database handle the rest.

In this meantime, I do want to point out that @buggy is right, the method you're using above to construct your UPDATE statement using string concatenation does leave you open to SQL injection. You should be using the variablesMap to place any user-inputted data into your request.

May I suggest the following instead:

#set ($updates = "")

#if ($ctx.args.name)
    #set ($updates = "$updates, name = :name")
#end

#if ($ctx.args.duration)
    #set ($updates = "$updates, duration = :duration")
#end

{
    "statements" : [
        "UPDATE tracks SET $updates.substring(1) WHERE id = :id AND owner_id = :username RETURNING *;"
    ],
    "variableMap": {
        ":name": $util.toJson($ctx.args.name.replaceAll("'", "''")),
        ":duration": $util.toJson($ctx.args.duration.replaceAll("'", "''")),
        ":id": $util.toJson($ctx.args.id.replaceAll("'", "''")),
        ":username": $util.toJson($ctx.identity.username.replaceAll("'", "''"))
    },
    "version" : "2018-05-29"
}
ffxsam commented 4 years ago

@AaronHarris Thanks! Yeah, I later realized this and decided to sanitize on the client side:

export default function sanitizeSql(sql: string): string {
  // eslint-disable-next-line no-control-regex
  return sql.replace(/[\0\x08\x09\x1a\n\r"'\\%]/g, char => {
    switch (char) {
      case '\0': // null
        return '';
      case '\x08': // backspace
        return '';
      case '\x09': // tab
        return '\\t';
      case '\x1a': // ^Z char
        return '';
      case '\n':
        return '\\n';
      case '\r':
        return '\\r';
      case "'":
        return "''";
      case '"':
      case '\\':
        return '\\' + char;
      default:
        return char;
    }
  });
}

It would be nice to see something like the function above be utilized in a new utility, e.g. $util.rds.sanitizeSql().

paya-cz commented 3 years ago

I was so shocked when I realized "variableMap" does nothing to protect against SQL injections, even though it is industry-standard to use separate variable map field exactly for that purpose (just look at what graphql is doing!). Please add a flag to the RDS resolver that switches the "variableMap" to operate like "parameterMap" that passes the parameters properly, without the need to escape the data in the first place! This looks like a major oversight to me.

y0rd4nis commented 2 years ago

This post is almost 2 years old, AppSync queries using variableMap aren't sent as prepared or parameterized statements to RDS? I think this is a must-have feature in AppSync, SDKs for RDS & RDS-DATA are already capable of sending prepared requests to RDS.

Sanitizing scaping special characters is error-prone and not recommended.

ffxsam commented 2 years ago

If this helps anyone else: Lately I've just been using Lambda data sources to connect resolvers to Postgres. Using the VTL scripts is a real headache. Then I just use knex in my Node.js code which automatically sanitizes all SQL and input values.

paya-cz commented 2 years ago

Just like @ffxsam I have personally decided to stop using VTL resolvers except for the simplest of cases, and I recommend others to do the same. I use Node.js lambdas, JavaScript/TypeScript is naturally suited to work with JSON objects, and I use data-api-client to avoid direct SQL connections, not to mention it supports properly parametrized queries.

My experience with VTL resolvers:

Ricardo1980 commented 2 years ago

@paya-cz I'm just curious, what about price? Isn't it much more expensive using Lambdas than VTLs? especially at scale. Thanks.

paya-cz commented 2 years ago

@Ricardo1980

Well, VTL resolvers are "free". Or to be more precise, their pricing is baked into the per-request cost and you pay for them even if you don't use them (such as when using direct lambda resolvers).

On the other hand, lambdas are billed separately.

Thus, mathematically speaking, lambdas are infinitely more expensive than VTL. However, running 128 MB lambda for few ms turns out to be very cheap even for a significant number of requests.

For a million lambda requests, 128 MB, each running 250ms, you would pay a total of 0.72 USD. AppSync costs 4 USD for a million requests (just for relative comparison, you would pay both!).

To me, that's not a significant cost increase to justify wasting time on vast majority of VTL resolvers. Your mileage may vary, especially if you are handling multiple orders of magnitude larger volume. But even then you can use VTL only on the most-hit endpoints.

Ricardo1980 commented 2 years ago

@paya-cz Thanks a lot for your detailed answer. It seems to me that for just accessing the data base, you are right.

There are some potential edge cases though. For example, in my case I use a lot of lambda functions because when I write something (I also use data-api-client, super nice library), I also want to send an email and/or push notification, and that takes more than a few milliseconds, sometimes half second or more. I believe that technically I could do that using the HTTP VTL templates to access AWS services like pinpoint (I have never tried that) and pipelines to separate everything in different steps. Perhaps it that scenario and millions But event in that case, it seems a pain in the ass programming that with VTL. Using lambdas and nodejs is super easy. Just some ideas

onlybakam commented 7 months ago

We have released new JavaScript resolvers with RDS utilities that you can use to access your Aurora data source. See: https://docs.aws.amazon.com/appsync/latest/devguide/resolver-reference-rds-js.html