HazyResearch / deepdive

DeepDive
deepdive.stanford.edu
1.95k stars 539 forks source link

error in running smoke example #607

Open rudaoshi opened 7 years ago

rudaoshi commented 7 years ago

when runing "deepdive run" command, the command reports:

2016-12-20 10:30:26.823709 ERROR: extra data after last expected column 2016-12-20 10:30:26.823776 CONTEXT: COPY person_has_cancer, line 1: "1 \N \N"

The schema of table person_has_canser is:

Column | Type | Modifiers | Storage | Stats target | Description -----------+---------+-----------+---------+--------------+------------- person_id | bigint | | plain | | id | bigint | | plain | | label | boolean | | plain | |

rudaoshi commented 7 years ago

I removed the last columns of both person_has_cancer.tsv and person_smoke.tsv. The above error disappear but following error occurs:

2016-12-20 11:29:24.266355 + sampler-dw gibbs -w /dev/fd/63 -v /dev/fd/62 -f /dev/fd/61 -m factorgraph/meta -o weights -l 0 -i 1000 --alpha 0.01 2016-12-20 11:29:24.266373 ++ find -L factorgraph/factors -type f -exec pbzip2 -c -d -k '{}' + 2016-12-20 11:29:24.282508 pbzip2: ERROR: File [factorgraph/variables/person_has_cancer/variables.part-2.bin.bz2] is NOT a valid bzip2! Skipping... 2016-12-20 11:29:24.282627 ------------------------------------------- 2016-12-20 11:29:24.282648 pbzip2: ERROR: File [factorgraph/variables/person_has_cancer/variables.part-3.bin.bz2] is NOT a valid bzip2! Skipping... 2016-12-20 11:29:24.282666 ------------------------------------------- 2016-12-20 11:29:24.282910 pbzip2: *ERROR: File [factorgraph/variables/person_smokes/variables.part-2.bin.bz2] is NOT a valid bzip2! Skipping... 2016-12-20 11:29:24.283018 -------------------------------------------

2016-12-20 11:29:24.283069 pbzip2: *ERROR: File [factorgraph/variables/person_smokes/variables.part-3.bin.bz2] is NOT a valid bzip2! Skipping... 2016-12-20 11:29:24.283109 ------------------------------------------- 2016-12-20 11:29:24.291100 PARSE ERROR: 2016-12-20 11:29:24.291156 Required argument missing: n_samples_per_learning_epoch 2016-12-20 11:29:24.291170 2016-12-20 11:29:24.291187 Brief USAGE: 2016-12-20 11:29:24.291267 sampler-dw gibbs [--learn_non_evidence] ... [--sample_evidence] ... 2016-12-20 11:29:24.291354 [-q] ... [--regularization ] ... [-b 2016-12-20 11:29:24.291436 ] ... [-d ] ... [-p ] ... 2016-12-20 11:29:24.291527 [-a ] ... [-c ] ... [--burn_in ] 2016-12-20 11:29:24.291609 ... -i ... -s ... -l ... [-j 2016-12-20 11:29:24.291782 ] [-r ] [-o ] [-w ] 2016-12-20 11:29:24.291875 [-e ] [-f ] [-v ] [-m 2016-12-20 11:29:24.291955 ] [--] [--version] [-h]

rudaoshi commented 7 years ago

After add "--n_samples_per_learning_epoch 3" in the parameter list, the program runs smoothly.

The data file, config and document may need update.

alldefector commented 7 years ago

@rudaoshi the person_has_canser.id column indicates that it was created by an old version (say v0.8) of DeepDive, and that schema is no longer compatible with the examples in the latest git repo. We are about to release the next version, but until then, you could build from git by running make build; make install. Then running the examples should work. FYI, with the latest build, here is the schema:

       Table "public.person_has_cancer"
    Column     |       Type       | Modifiers 
---------------+------------------+-----------
 person_id     | bigint           | 
 dd_label      | boolean          | 
 dd_truthiness | double precision |