EHDEN / CatalogueExport

Exports the data from the OMOP-CDM that is necessary for the EHDEN Database Catalogue
Apache License 2.0
9 stars 6 forks source link

org.postgresql.util.PSQLException: ERROR: relation "results.catalogue_results" does not exist #24

Closed filipmaljkovic closed 3 years ago

filipmaljkovic commented 3 years ago

I get an error message when running Catalog export. This is the code:

cdmDatabaseSchema <- "cdm"
resultsDatabaseSchema <- "results"
vocabDatabaseSchema <- cdmDatabaseSchema
numThreads <- 1
cdmSourceName <- "Heliant"
cdmVersion <- "5.3.1"

library(CatalogueExport)

CatalogueExport::catalogueExport(connectionDetails, cdmDatabaseSchema = cdmDatabaseSchema, resultsDatabaseSchema = resultsDatabaseSchema, vocabDatabaseSchema = vocabDatabaseSchema, numThreads = numThreads, sourceName = cdmSourceName, cdmVersion = cdmVersion)

Here's the console output: output.txt

And here's the final error report I get (all the intermediate ones do get overwritten for some reason): errorReportSql.txt

The same thing happens if I add the "createTable = TRUE" parameter to the call.

I really don't have a catalogue_report table, as can be seen in this screenshot: tables

PRijnbeek commented 3 years ago

I reran this on our postgresql box and did not have these issues. The fact you do not have the catalogue_results table is because some queries did not execute so these temp tables are not there (these are created per query to allow for parallel execution). This means that the catalogue_results table cannot be created out of all the temp tables.

You seem to have issue with queries that use the concept_ancestor table.

Can you run this query in your database and report back the result:

-- 430 Number of descendant condition occurrence records,by condition_concept_id

--HINT DISTRIBUTE_ON_KEY(stratum_1)

WITH CTE_CONDITION AS ( SELECT ca.ANCESTOR_CONCEPT_ID AS CONCEPT_ID, COUNT() AS DRC FROM cdm.CONDITION_OCCURRENCE co JOIN cdm.CONCEPT_ANCESTOR ca ON ca.DESCENDANT_CONCEPT_ID = co.CONDITION_CONCEPT_ID GROUP BY ca.ANCESTOR_CONCEPT_ID ) SELECT 430 as analysis_id, CAST(co.CONDITION_CONCEPT_ID AS VARCHAR(255)) AS stratum_1, cast(null as varchar(255)) AS stratum_2, cast(null as varchar(255)) as stratum_3, cast(null as varchar(255)) as stratum_4, cast(null as varchar(255)) as stratum_5, floor((c.DRC+99)/100)100 as count_value FROM cdm.CONDITION_OCCURRENCE co JOIN CTE_CONDITION c ON c.CONCEPT_ID = co.CONDITION_CONCEPT_ID GROUP BY co.CONDITION_CONCEPT_ID, c.DRC

filipmaljkovic commented 3 years ago

That query returns empty resultset. The CTE_CONDITION subquery / statement scoped view also returns empty set.

PRijnbeek commented 3 years ago

that is very strange, do you have data in your concept_ancestor table?

scossin commented 3 years ago

Maybe it's a coincidence but @filipmaljkovic got a _SQL:org.postgresql.util.PSQLException:ERROR:no schema has been selected to create__ error for queries: 430, 630, 730, 830, 1830, 2130 and we got a SQLSyntaxErrorException: ORA-00905: missing keyword for the exact same queries: https://github.com/EHDEN/CatalogueExport/issues/25

PRijnbeek commented 3 years ago

Yes I was thinking the same, but that would not explain why he does not get back the CTE in his own environment..

@scossin if you get it running till the end then I can create a new release with your fixes in there and we can test also on the Serbian dataset from @filipmaljkovic

filipmaljkovic commented 3 years ago

that is very strange, do you have data in your concept_ancestor table?

Yes, 122833 rows.

filipmaljkovic commented 3 years ago

Also, I didn't initially attach any of the intermediate errors that I got (because they get overwritten), but here's one. From different runs and different samples, they all look alike and have similar semantics -- insert statement is into something like stmpach***, that doesn't specify a schema, which then fails.

errorReportSql-intermediate.txt

scossin commented 3 years ago

@filipmaljkovic Yes, intermediate errors are overwritten, I will open an issue for that.
In your error file, I think the "into s_tmpach_830" line is causing the problem. By adding a ";" at the end of the template query 830 it would generate another SQL for postgresql that might work for you:

CREATE TABLE s_tmpach_830
AS
WITH CTE_procedure AS (
  SELECT ca.ancestor_concept_id AS concept_id, COUNT(*)
  AS DRC
  FROM cdm.observation co
  JOIN cdm.concept_ancestor ca
  ON ca.descendant_concept_id = co.observation_concept_id
  GROUP BY ca.ancestor_concept_id
)
SELECT  830 as analysis_id,
CAST(co.observation_concept_id AS VARCHAR(255)) AS stratum_1,
cast(null as varchar(255)) AS stratum_2,
cast(null as varchar(255)) as stratum_3,
cast(null as varchar(255)) as stratum_4,
cast(null as varchar(255)) as stratum_5,
floor((c.DRC+99)/100)*100 as count_value
FROM cdm.observation co
JOIN CTE_procedure c
ON c.concept_id = co.observation_concept_id
GROUP BY co.observation_concept_id, c.DRC
;
filipmaljkovic commented 3 years ago

Is there a way to edit template queries? I figured that the upstream library/package should be fixed.

I tried running this query, but of course I had to specify the results schema up front. The table was empty.

However, as per output.txt, there were in total 6 errors of that kind, only one of which is 830. So any fix should include other occurrences, I suppose.

scossin commented 3 years ago

@filipmaljkovic You can try to re-install the package from the dev branch (that contains a fix for these 6 errors) with this command:

devtools::install_github(repo = "EHDEN/CatalogueExport@dev")
filipmaljkovic commented 3 years ago

Thanks, that did the trick! I confirm successful generation of the csv.