OHDSI / ETL-CMS

Workproducts to ETL CMS datasets into OMOP Common Data Model
Apache License 2.0
94 stars 52 forks source link

Error in loading CDMv5 vocabularies into postgres #32

Closed hajarhomayouni closed 3 years ago

hajarhomayouni commented 7 years ago

Hi, I followed the steps in documentations to load data in postgres. but in part e of step 7, when I wanna load data using the sql file, I see the following error:

psql:load_CDMv5_vocabulary.sql:26: ERROR: extra data after last expected column CONTEXT: COPY drug_strength, line 2: "42478670 913782 2 8576 2 0150817 20991231

I got vocabularies from OMOP Vocabulary Web Site and did not change it. I couldn't find the problem. can any one tell me what I am doing wrong.

ChristopheLambert commented 7 years ago

Can you specify whether you are using postgreSQL on Windows or Linux or some other OS, and what version of postgreSQL you are using? Can you post the first 3 lines of your drug_strength file? It sounds like the drug_strength table does not have the same fields as the drug_strength file. A field, denominator_value, was added to the drug_strength table in 5.0.1 of the CDM. Is it possible your drug_strength table does not have that field? I think the latest vocabulary download will be 5.0.1. I also think that the code we provided to create the vocabulary tables was for 5.0.1, whereas we kept with version 5.0 for the cost tables. Did you use our create_CDMv5_tables.sql code to create the tables?

hajarhomayouni commented 7 years ago

I am using postgres on Linux and I use version (9.5.4). I use your create_CDMv5_tables.sql code and I received the download link for vocabularies from OMOP Vocabulary website.

I attached some first lines of DRUG_STRENGTH csv file.

1

ChristopheLambert commented 7 years ago

Your DRUG_STRENGTH.csv file has a column called BOX_SIZE which is not documented in either v5.0.0 or v5.0.1 of the CDM at this time. Assuming you got DRUG_STRENGTH.csv from the athena.ohdsi.org web site, it would appear they have made a change to the vocabulary. I would ask @Christian_Reich about this. In the meantime, you could strip out that column and try loading again, or add this field to the table creation code. There is a post about the box_size field on the OHDSI forums here: http://forums.ohdsi.org/t/run-the-vocabimport-script-fail-for-cmd-v5/1625/4, where I have also followed up.

hajarhomayouni commented 7 years ago

Thank you Christopher, I added BOX_SIZE to table DRUG_STRENGTH, and ran the following command again:

\COPY synpuf5.drug_strength FROM '/home/lambert/Dropbox/Research/vocab_download_v5/DRUG_STRENGTH.csv' WITH DELIMITER E' ' CSV HEADER QUOTE E'\b';

This time I see this error message:

sql:load_CDMv5_vocabulary.sql:17: ERROR: invalid input syntax for integer: "42478670 913782 2 8576 20150817 20991231 " CONTEXT: COPY drug_strength, line 2, column drug_concept_id: "42478670 913782 2 8576 20150817 20991231 "

I think something is wrong with the delimiter? What is the correct delimiter I should use?

ChristopheLambert commented 7 years ago

It should be tab delimited. You also need to set the path to where your file is stored.

ChristopheLambert commented 7 years ago

Have you resolved this issue, and can we close it out?

ChristopheLambert commented 3 years ago

Closing issue for lack of response.

cgreich commented 3 years ago

Is that a legitimate reason, @ChristopheLambert? :) Let me draw the attention of the Athena team to this.

Tuntufye4 commented 2 years ago

Have you resolved this issue, and can we close it out?

I am having the same issue