Closed lawrenceadams closed 3 weeks ago
Interestingly the former has been solved upstream:
I definitely agree we should choose a consistent and deterministic way of doing this, and I like ROW_NUMBER() OVER() ... are you sure SQL Server doesn't support it? My Googling says yes but I've never actually used SQL Server.
I definitely agree we should choose a consistent and deterministic way of doing this, and I like ROW_NUMBER() OVER() ... are you sure SQL Server doesn't support it? My Googling says yes but I've never actually used SQL Server.
Sorry I made my point badly - you're absolutely right it does support it, but it does not support an empty OVER ()
clause (like my first example above), instead you need to provide an order by sequence or do OVER (SELECT NULL)
(which I think we should avoid)
https://stackoverflow.com/questions/44105691/row-number-without-order-by
Ahh OK! That makes more sense 🙃 and agree, we should choose some ordering key for consistency even if there are dupes in a table (which will happen in OMOP).
row_number
is sprinkled out the repository, however it is used in various different ways which are likely to give unexpected/non-deterministic behaviour between runs. We usually will have a logical means of ordering them - if not date, then an ID of some sort. They tend to be used as:row_number() over ()
row_number() over(order by (select null)
The variation is likely to be from a mixture of copy-pasted sources (e.g. T-SQL doesn't allow
...OVER ()
but Postgres/DuckDB do). The main two offenders I can find so far:https://github.com/OHDSI/dbt-synthea/blob/ae791145d50c9e0693880ff9ed37d60f7cc0195d/models/omop/location.sql#L2
https://github.com/OHDSI/dbt-synthea/blob/75a1e12ae218f1fbaa97cda301423f8113508b25/models/omop/provider.sql#L2
Although this is probably has low impact downstream, it may be causing unexpected behaviour e.g. #47