BritishGeologicalSurvey / etlhelper

ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
https://britishgeologicalsurvey.github.io/etlhelper/
GNU Lesser General Public License v3.0
104 stars 25 forks source link

Screen table and column names in `generate_insert_sql()` for SQL injection attempts #133

Closed volcan01010 closed 1 year ago

volcan01010 commented 1 year ago

Summary

As an ETLHelper user, I want the generate_insert_sql() function to screen the table and column names used to generate the insert SQL so that it cannot be used for SQL injection.

Description

The generate_insert_sql() function dynamically generates an insert function based on the table name and column names using string formatting. If ETLHelper was used in a situation where it handled user-supplied data there is a possibility that this could be used to generate a malicious command.

https://github.com/BritishGeologicalSurvey/etlhelper/blob/e8ae62c3ed92dbec5b306f9e60f8887828273887/etlhelper/etl.py#L532

The PostgreSQL documentation describes the valid characters that can be used in identifiers, e.g. table names. Updating the generate_insert_sql() function to ensure that table and column names contain only these characters and raising an Exception if not will protect users from potential SQL injection attacks.

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

There is the possibility that the names contain international characters. There is information on matching them with a regular expression here:

Note that parameter values in queries are passed as arguments to the corresponding DBAPI 2 functions and are sanitised at that point.

Acceptance criteria