jonmdev commented 3 days ago

Why prepared statements are needed

The documentation links to this practice document for SQL safety:

https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html

It states:

Defense Option 4: STRONGLY DISCOURAGED: Escaping All User-Supplied Input¶

In this approach, the developer will escape all user input before putting it in a query. It is very database specific in its implementation. This methodology is frail compared to other defenses and we CANNOT guarantee that this option will prevent all SQL injections in all situations.

If an application is built from scratch or requires low risk tolerance, it should be built or re-written using parameterized queries, stored procedures, or some kind of Object Relational Mapper (ORM) that builds your queries for you.

In other words, we absolutely need prepared statements, if users are writing their own inputs to our queries or inserts.

Postgres?

I thought this must be possible by Postgres connection as this allows prepared statements. I managed to get a Postgres connection working by Rust tonight. But then I came to understand, as per the documentation:

Particularly, ArcadeDB does only support "simple" query mode and does not support SSL!

Simple query mode is no different than HTTP query. For example, in Rust with Postgres this is the same format of an HTTP Request (plain string, which is unsafe when you may have user written inputs):

    let query_result = client.simple_query("SELECT * FROM Customer"); //hypothetically, just to show syntax

Other practices like whitelisting input are not possible when users can type their own text inputs such as free entry text fields. Limiting characters to a-z,A-Z, 0-9 is also not possible when you have global users inputting every variety of unicode or emojis or punctuation.

JDBC?

Is JDBC then the only way to perform prepared statements with Arcade? If so I think this should be better documented in the documentation.

Say from a Rust or Elixir server, which is interfacing to Arcade installed and running on a separate Ubuntu server - I can get a JDBC system working under my Rust server. Is that then my only choice to connect and query/insert to Arcade from there safely?

Whatever the answer, I think the options for prepared statements should be more clearly enumerated in the documentation, given how critical this is.

Are the options (whatever they are) ever likely to change as well? Thanks for any thoughts or clarification.

gramian commented 3 days ago

The HTTP API can pass parameters into queries and commands. Also, the Java API allows passing parameters. I guess, in both cases these parameterized queries can easily be encapsulated into functions (or objects) making them prepared statements.

jonmdev commented 3 days ago

The HTTP API can pass parameters into queries and commands.

I don't see any examples of anything that could protect against SQL injection there? My understanding is that to protect against SQL injection you need to prepare and finalize the query function the database will perform in terms of its operations and constraints (ie. INSERT INTO BUCKET:Customer_Europe CONTENT $1) separately from the variables ($1 = { firstName: 'Enzo', lastName: 'Ferrari' }) so that the variables that go into it can't alter the operations that can be performed (ie. even if the user sneaks in DROP with malicious character escaping, there will be no capacity for this operation to be performed, because it was not included in the preparation of the statement/function).

Also, the Java API allows passing parameters.

How does that work? I have read those Java snippets since a year ago and I still don't understand how they could be used.

If you have:

1) ArcadeDB installed on Ubuntu server 1. 2) Rust/Elixir installed on Ubuntu server 2. 3) Users connect to Rust/Elixir server 2.

How do you use the Java API? The only way I see is by the Rust server connecting to the Arcade server by JDBC. Which then again, means the only way to protect against SQL injection (which is really a critical thing we all need to do unless there is zero user written input) is by JDBC connection. And this is not commonly or clearly available.

Why not at least Postgres too? Given there is more availability? It took me only a few hours (yeah, "only" 😬) to get Rust connected by Postgres. I was going to add it to the examples, but it now seems like a pointless method as it is not remotely safe.

Or if we must do it by JDBC, can we get some better clarity on how?

I see for example many JDBC drivers:

https://www.codejava.net/java-se/jdbc/jdbc-driver-library-download

I can wrap one into Rust, for example with this: https://lib.rs/crates/jni

But I'm not sure what JDBC version I am supposed to be using. For something so critical, I don't think this is well explained. I don't get it. I used to think I just needed to come back to it later, and it would make sense eventually. But now I actually need it running. And now I have done HTTP requests in multiple languages (I posted several of the examples on the documentation), tonight I succeeded with Postgres requests. But I am still no closer.

Any further thoughts? @lvca any thoughts as well?

Thanks as always for any help. 👍

gramian commented 3 days ago

I don't think I can solve your problem(s), but here are some remarks that may hopefully help:

Above, I meant, for example, using a function to wrap the statement and after asserting the wrapping function's arguments, passing those as parameters into the wrapped statement.
With the query endpoint or method only read operations can be performed, altering operation are only admissible via the command endpoint or method.
The query language sql allows one and only one statement, so if you constrain your user to sql they cannot inject an additional statement.
~~There maybe the corner case of an injectible sub-query or local LET~~

lvca commented 2 days ago

Our postgres driver implementation supports prepared statements. Look at: https://github.com/ArcadeData/arcadedb/blob/13ccc3e750f505e65fc586d76e1803bffc935209/postgresw/src/main/java/com/arcadedb/postgres/PostgresNetworkExecutor.java#L580. Postgres protocol has the "BIND" message where parameters are passed and then executed with ArcadeDB in a safe manner..

@jonmdev have you tried using parameters with the Postgres driver?

jonmdev commented 2 days ago

Above, I meant, for example, using a function to wrap the statement and after asserting the wrapping function's arguments, passing those as parameters into the wrapped statement.

That was my original instinct as well. ie. Just do escaping and try to be safe. Probably you are right in many cases. But it seems like an game of Russian roulette. As I read more over the past few weeks, everything I have encountered has told me this approach is extremely dangerous. If you slip up or someone out-thinks you you can have a major data breach or the service could be wiped out. I am not so confident in myself in that manner.

If your system requires allowing users to input text and then have that written into the database, the only certainly safe way everyone says is using prepared statements.

For example, I asked ChatGPT something like if this was safe (eg. HTTP Request or Postgres Simple Query):

user_input = sanitize(user_input) //where sanitize is an external function to check for escaping or other problems
let query_result = client.simple_query("SELECT ${user_input} FROM Customer"); //hypothetically, just to show syntax

And it says over and over in different words same as every resource I see online including the official one above:

What to Avoid: String Interpolation in SQL Queries

If you attempt to insert user input directly into SQL queries via string interpolation (like this): let query_result = client.simple_query(SELECT * FROM Customer WHERE name = '${user_input}'); This approach is vulnerable to SQL injection. If an attacker enters malicious input such as '; DROP TABLE Customer; --, they could modify or delete your database. Never use string interpolation to construct SQL queries with user input directly.

They suggest we need prepared statements such as:

Safe String Substitution in SQL: When building SQL queries or similar strings, parameterized queries are the most reliable and secure method of including user input inside queries. This approach avoids SQL injection vulnerabilities and ensures input is properly escaped and handled.

Then they give examples of prepared (parameterized) queries like:

  const sanitized_input = sanitize(user_input);
  // Parameterized query
  const res = await client.query(
    'SELECT * FROM Customer WHERE name = $1', // $1 is a placeholder
    [sanitized_input]  // user input passed as parameter
  );

Even as I look at the API's for Postgres adapters they keep warning this constantly and every Stackoverflow thread I pull up says the same. So while it may not be so vulnerable as they say (based on your points), I think it is good enough for me now to read that enough that I am paranoid and do not wish to touch anything except by prepared/parameterized statements. Probably I suspect everyone should then just follow that approach as there is no harm and only safety and benefit.

It is just then a matter of how we do it then to figure out, but I think it definitely ought to be the default approach to be safe.

gramian commented 2 days ago

To my understanding this is not how parameters work, at least in ArcadeDB. A parameter is placed into the parameterized query or command as its type. For example SELECT FROM doc WHERE name = :myparam becomes SELECT FROM doc WHERE name = 'test' or SELECT FROM doc WHERE name = 4 depending of the type of the parameter myparam.

jonmdev commented 2 days ago

Our postgres driver implementation supports prepared statements.

That would be great @lvca. I am not sure how this would work, though. Because the documentation we have on Arcade says that Arcade only supports "simple" mode for Postgres.

All the documentation I have on "simple postgres mode" suggests this does not permit parameterized/prepared statements but rather just plain string ones like HTTP.

For example, the Postgres package I got working for simple mode with Rust looks like this in practice:

    let mut config = Config::new();
    config.host("localhost")
          .user("root")
          .password("password")
          .dbname("mydb");
    let result: Result<Client, postgres::Error> = config.connect(NoTls);
    let mut client = result.expect("FAILED TO UNWRAP CLIENT");  // This will panic if the connection fails

    let query_result = client.simple_query("SELECT * FROM Customer");

This package is the main one for Rust with >5 million downloads and is here:

https://crates.io/crates/postgres

You can see the documentation on Simple Queries here:

https://docs.rs/postgres/latest/postgres/enum.SimpleQueryMessage.html?search=simplequery

pub fn simple_query( &mut self, query: &str, ) -> Result<Vec<SimpleQueryMessage>, Error>

Executes a sequence of SQL statements using the simple query protocol.

Statements should be separated by semicolons. If an error occurs, execution of the sequence will stop at that point. The simple query protocol returns the values in rows as strings rather than in their binary encodings, so the associated row type doesn’t work with the FromSql trait. Rather than simply returning the rows, this method returns a sequence of an enum which indicates either the completion of one of the commands, or a row of data. This preserves the framing between the separate statements in the request.

Warning

Prepared statements should be used for any query which contains user-specified data, as they provided the functionality to safely embed that data in the request. Do not form statements via string concatenation and pass them to this method!

So they are again warning us not to use this method as it only takes a string input. The prepared version (correct query method) is query, which does clearly allow parameters in the arguments, but this is then not "simple" mode.

ChatGPT confirms this as well:

Question: does simple mode for postgres allow parameters?

In PostgreSQL, the simple_query method provided by the PostgreSQL client libraries (like pg for Node.js) is typically a low-level query execution function that does not support parameterized queries directly. Instead, you would typically use query or preparedStatement methods for executing queries with parameters safely.

The simple_query method is generally used for executing straightforward SQL queries. However, it does not support parameterized queries or placeholders (like $1, $2, etc.). It's more useful for quick, single-line queries or administrative commands that don't require dynamic user input.

This method just sends the query directly to PostgreSQL without any parameterization, so you cannot use placeholders or safely insert user input into the query.

query (Supports Parameterized Queries)

On the other hand, the query method is the one you would typically use if you need to pass parameters to your query safely. You can safely use parameterized queries with placeholders (like $1, $2, etc.) using query.

And their conclusion:

Why Use query over simple_query for Parameters?

simple_query doesn't support placeholders or parameterized queries, which means you must manually escape values or rely on external sanitization, which can be error-prone and insecure.

query with placeholders like $1, $2, etc., ensures safe and secure handling of dynamic values (like user input) and is the recommended method to avoid SQL injection vulnerabilities.

This then matches the Rust API. Thus if the Postgres system on your end allows parameterized queries, it does not require "simple" mode, or at least I believe these are two counteracting things.

Thanks for any further thoughts.

gramian commented 2 days ago

If this is a response to me: sorry, I was referring to the HTTP API here.

ArcadeData / arcadedb

What are the permitted ways to make prepared statement queries or inserts? Ie. To avoid SQL injection? Is it just JDBC? Postgres? #1829

Why prepared statements are needed

Postgres?

JDBC?

What to Avoid: String Interpolation in SQL Queries

Question: does simple mode for postgres allow parameters?

query (Supports Parameterized Queries)

Why Use query over simple_query for Parameters?