HazyResearch / deepdive

DeepDive
deepdive.stanford.edu
1.96k stars 536 forks source link

DataLoader interface problem for Greenplum #137

Closed zhangce closed 10 years ago

zhangce commented 10 years ago

It involves some code refactoring, unit tests and load function. I tested it under postgres, and Sen tested under Greenplum.

First, are you sure this runs on GP?

115 GPLOAD: 116 INPUT: 122 OUTPUT:

Why OUTPUT is in the same level of indentation with GPLOAD? YAML is indentation-sensitive. I am expecting this line gives

czhang@raiders4:~$ gpload -f gpload.yaml
2014-09-13 21:53:47|ERROR|unexpected key: "output"

DId you guys really run the code you wrote on GP? Or am I missing anything?

111 DATABASE: ${dbSettings.dbname} USER: ${dbSettings.user} HOST: ${dbSettings.host} PORT: ${dbSettings.port}

You miss the case where dbSettings.dbname matches null. See line 68 in ExtractorRunner.scala Also, shouldn't the password also be here?

139 val sql = """COPY """ + s"${tablename} FROM STDIN"

You are assuming the DB client and server are on the same machine. On line 85 of the same file, YOURSELF uses a different one: \COPY

128 val cmd = s"gpload -f ${loadyaml.getAbsolutePath()}"

Can we assume gpload is directly callable? Shouldn't there be an command line option specify where is the binary?

senwu commented 10 years ago

It involves some code refactoring, unit tests and load function. I tested it under postgres, and Sen tested under Greenplum.

First, are you sure this runs on GP?

115 GPLOAD: 116 INPUT: 122 OUTPUT:

Why OUTPUT is in the same level of indentation with GPLOAD? YAML is indentation-sensitive. I am expecting this line gives

czhang@raiders4:~$ gpload -f gpload.yaml
2014-09-13 21:53:47|ERROR|unexpected key: "output"

DId you guys really run the code you wrote on GP? Or am I missing anything?

111 DATABASE: ${dbSettings.dbname} USER: ${dbSettings.user} HOST: ${dbSettings.host} PORT: ${dbSettings.port}

You miss the case where dbSettings.dbname matches null. See line 68 in ExtractorRunner.scala

139 val sql = """COPY """ + s"${tablename} FROM STDIN"

You are assuming the DB client and server are on the same machine. On line 85 of the same file, YOURSELF uses a different one: \COPY

128 val cmd = s"gpload -f ${loadyaml.getAbsolutePath()}"

Can we assume gpload is directly callable? Shouldn't there be an command line option specify where is the binary?

feiranwang commented 10 years ago

please see my email for my answers.

rionda commented 10 years ago

I think this was fixed, can we please close it? @feiranwang @zhangce @SenWu .

feiranwang commented 10 years ago

Should have been solved in Sep.