As an ETLHelper user, I want the generate_insert_sql() function to screen the table and column names used to generate the insert SQL so that it cannot be used for SQL injection.
Description
The generate_insert_sql() function dynamically generates an insert function based on the table name and column names using string formatting. If ETLHelper was used in a situation where it handled user-supplied data there is a possibility that this could be used to generate a malicious command.
The PostgreSQL documentation describes the valid characters that can be used in identifiers, e.g. table names. Updating the generate_insert_sql() function to ensure that table and column names contain only these characters and raising an Exception if not will protect users from potential SQL injection attacks.
Summary
As an ETLHelper user, I want the
generate_insert_sql()
function to screen the table and column names used to generate the insert SQL so that it cannot be used for SQL injection.Description
The
generate_insert_sql()
function dynamically generates an insert function based on the table name and column names using string formatting. If ETLHelper was used in a situation where it handled user-supplied data there is a possibility that this could be used to generate a malicious command.https://github.com/BritishGeologicalSurvey/etlhelper/blob/e8ae62c3ed92dbec5b306f9e60f8887828273887/etlhelper/etl.py#L532
The PostgreSQL documentation describes the valid characters that can be used in identifiers, e.g. table names. Updating the
generate_insert_sql()
function to ensure that table and column names contain only these characters and raising an Exception if not will protect users from potential SQL injection attacks.https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
There is the possibility that the names contain international characters. There is information on matching them with a regular expression here:
Note that parameter values in queries are passed as arguments to the corresponding DBAPI 2 functions and are sanitised at that point.
Acceptance criteria
generate_insert_sql()
rejects identifiers containing characters other than letters, numbers, _ or $.generate_insert_sql()
rejects identifiers beginning with numbers or $.