STRIDES-Codes / subject-sample-search

MIT License
3 stars 2 forks source link

Multi line SQL queries from Python cause problems with GA4GH Search #3

Open ianfore opened 3 years ago

ianfore commented 3 years ago

In my BigQuery example I can write a SQL statement which extends over multiple lines e.g.

    searchClient = BigQuerySearchClient()
    query = """
        SELECT submitter_id, read_drs_id
        FROM `isbcgc-216220.onek_genomes.ssd_drs`
        where population = 'BEB'
        LIMIT 1"""
    res = searchClient.runQuery(query)

This is helpful for readability.

If I submit a query formatted this way to DiscoverySearchClient the GA4GH Search implementation gives me an error.

sdhutchins commented 3 years ago

I'm going to take a look at this!

ianfore commented 3 years ago

This is the error

{'error': 'Bad Request',
 'message': 'JSON parse error: Illegal unquoted character ((CTRL-CHAR, code '
            '10)): has to be escaped using backslash to be included in string '
            'value; nested exception is '
            'com.fasterxml.jackson.databind.JsonMappingException: Illegal '
            'unquoted character ((CTRL-CHAR, code 10)): has to be escaped '
            'using backslash to be included in string value\n'
            ' at [Source: (PushbackInputStream); line: 1, column: 10] (through '
            'reference chain: '
            'com.dnastack.ga4gh.search.adapter.presto.SearchRequest["query"])',
 'path': '/search',
 'status': 400,
 'timestamp': '2020-11-18T13:19:32.316+0000'}

The new line characters (ascii 10) that end up in the query are illegal json and are rejected by the parser on the server.

Using the following approach should have some portability across Windows and unix as new line characters vary across platforms. It works for me but I'm going on hearsay about the portability. query = query.replace("\n", " ").replace("\t", " ")

Note that I also had to replace the tab character that was ending up in the query string. Some will frown that I have python code indented with tabs, but as far as making the code robust goes, protecting against people like me is a good practice.

It makes best sense to handle this replacement in DiscoverySearchClient.

ianfore commented 3 years ago

Checked in a fix to DiscoverySearchClient. Keeping this open pending further comment or alternative solutions.