Search SQL queries use string interpolation

codefordenver / partner-finder

Using an open dataset with registered colorado business to build a tool that manages outreach to potential CFD partners.

3 stars 14 forks source link

Search SQL queries use string interpolation #170

Closed galbwe closed 3 years ago

galbwe commented 3 years ago

There are several sql queries in leads.py that are similar to this:

            SELECT
                {columns}
            FROM leads
            WHERE to_tsvector(company_name) @@ to_tsquery('{search}')
            ORDER BY ts_rank(to_tsvector(company_name), '{search}')
            LIMIT :limit
            OFFSET :offset

They are susceptible to injection attacks because the search parameter comes from user input. Find a way to convert them to SQLAlchemy expression language, or try to use a placeholder for them, similar to :limit and :offset.

TueeNguyen commented 3 years ago

I'd like to try tackling this issue.

galbwe commented 3 years ago

@TueNguyen2911 I just assigned you

TueeNguyen commented 3 years ago

Thank you!

TueeNguyen commented 3 years ago

I'm learning Python along the way with this issue. Supposed I change the queries, do I need to fully set up the backend to test? And how would I test it?

Thanks!

galbwe commented 3 years ago

Yeah you'll need to run the backend using docker compose. See the README instructions on doing that.

We don't have automated unit tests at the moment. The best way to test right now is to use the swagger docs to make api calls. See these instructions for accessing the docs.

TueeNguyen commented 3 years ago

I have a few questions.

How is your opinion on this change? After looking sqlalchemy up, instead of adding search to the textual query string using format(), I changed '{search}' into bound parameter :search and added search to query_args to pass into execute() so that in case of injection, it will auto-escape.

query = text(
        """
        SELECT
            {columns}
        FROM leads
        WHERE to_tsvector(company_name) @@ to_tsquery(:search)
        ORDER BY ts_rank(to_tsvector(company_name), :search)
        LIMIT :limit
        OFFSET :offset
    """.format(
            columns=",".join(DEFAULT_LEAD_FIELDS),
        )
    )
    query_args = {"limit": limit, "offset": offset, "search": search}

I have E501 line too long errors, I commented linting options in .vscode/setting.json to bypass. What should I do with these errors? Will I be able to make a PR?

galbwe commented 3 years ago

Using bound parameters for search looks like it works great! Not sure why it wasn't there to begin with.

Check out the section of the readme on linting and formatting in the backend. The lint/backend job should pass if the backend/scripts/lint.sh script passes. Let me know if you run into further issues with this.

TueeNguyen commented 3 years ago

I ran backend/scripts/lint.sh and it didn't show any error with leads.py, that's good. Thank you for telling me.