OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
130 stars 169 forks source link

Import of estimation from v2.7.0 fails in v2.7.2 #1153

Closed anthonysena closed 5 years ago

anthonysena commented 5 years ago

Scenario

I've exported an estimation created in Atlas v2.7.0: ple-2_7_0.txt and am attempting to import this into a different target Atlas running v2.7.2. In my target environment, I have concept sets and cohort definitions with the same names as in the attached design.

Expected Result

The design imports without issue

Actual result

The design import fails with the following stack trace: ple-2_7_0_import_stack_trace.txt. For brevity, here is the relevant part of the error:

"message": "Violation of UNIQUE KEY constraint 'uq_cs_name'. Cannot insert duplicate key in object 'dbo.concept_set'. The duplicate key value is ([OHDSI EU 2019] Negative controls).",

anthonysena commented 5 years ago

Some additional information on this problem since it appears to be related to using SQL Server for the back-end. In the design mentioned in this issue, the concept set name "[OHDSI EU 2019] Negative controls" is causing an issue when using the Hibernate "startsWith" functionality to detect duplicate names.

SQL Server uses the [ ] characters as part of the Wildcard scheme for the SQL Like operator: https://docs.microsoft.com/en-us/sql/t-sql/language-elements/wildcard-character-s-to-match-transact-sql?view=sql-server-2017. In the case of the concept set name [OHDSI EU 2019] Negative controls, the [ character is causing the underlying SQL query to miss the duplicated name and thus the unique constraint is violated.

I'm going to see if we can change the code to add the proper escape sequence but then this may cause issues on the other supported RDBMS' so it will need additional testing.

chrisknoll commented 5 years ago

Is it possible to leverage LEFT and LEN functions instead of using like? If you're looking for things that start with 'abcd' is it a matter of saying : WHERE left(name,len('abcd')) = 'abcd'?

pavgra commented 5 years ago

LIKE utilizes regular column indexes while I doubt that the left(name,len('abcd')) does until you explicitly define specific functional index. Why cannot you just escape [ with a backslash (\[)?

chrisknoll commented 5 years ago

That's true, the LEFT paramaters would be dynamic so while you could make a static index on a hard coded LEFT(col, {fixed number}), it wouldn't work for dynamic left paramaters.

Why cannot you just escape [ with a backslash ([)?

Might not behave the same way across all platforms. Using LEFT would work everywhere, but you do lose index performance.

Another way to solve this is to load all the distinct cohort names in memory for the duration of the transaction, and do in-memory lookups for duplicate names. It would side-step the whole 'cross database compatibility' issue all together.

pavgra commented 5 years ago

Might not behave the same way across all platforms.

Looking at the documentation, it works for MS SQL. I've checked PG - it works there too. Oracle's docs state that the syntax is also handled:

WITH data AS (
  SELECT '[abc] 123' a
)
SELECT *
FROM data
WHERE a LIKE '\[a%' ESCAPE '\'

Why would we need to re-invent the wheel then?

chrisknoll commented 5 years ago

Apologies for some ignorance on my part: it seems LIKE syntax does include features to handle escapes (and it seems consistent between oracle and tsql at least, do not know about PG):

match_expression [ NOT ] LIKE pattern ESCAPE escape_character

So, if we follow that syntax, we should be able to handle cross-platform escape character handling.

And, this isn't something that hast to be handed by all OHDSI platforms, jsut those platforms supported by WebAPI (oracle, postgresql, sql server). So, i think using the ESCAPE syntax is the answer here. I'm sorry if I missed this point at first, @pavgra.

pavgra commented 5 years ago

So, I believe, the recipe looks like changing e.g.

List<PathwayAnalysisEntity> findAllByNameStartsWith(String pattern)

to

@Query("SELECT pa FROM pathway_analysis pa WHERE pa.name LIKE ?1 ESCAPE '\'")
List<PathwayAnalysisEntity> findAllByNameStartsWith(String pattern)

And similarly in other repos

chrisknoll commented 5 years ago

And the parameter being passed into ?1 needs to have special characters escaped (ie: '[abc]' becomes '\[abc\]'). Correct? I'm assuming that the input parameter does not automatically get escaped.

pavgra commented 5 years ago

@chrisknoll, I think so too

anthonysena commented 5 years ago

And the parameter being passed into ?1 needs to have special characters escaped (ie: '[abc]' becomes '[abc]'). Correct? I'm assuming that the input parameter does not automatically get escaped.

@chrisknoll, I think so too

Confirmed that it does