commonsense / conceptnet5

Code for building ConceptNet from raw data.
Other
2.76k stars 352 forks source link

Error in job load_db while creating output file data/psql/done #90

Closed Shawn-Guo-CN closed 7 years ago

Shawn-Guo-CN commented 7 years ago

When I build a conceptnet on our sever, an error raised, the traceback information shows as follow: pg8000.core.ProgrammingError: ('FATAL', '28P01', 'password authentication failed for user "shawnguo"', 'auth.c', '288', 'auth_failed', '', '') Error in job load_db while creating output file data/psql/done. RuleException: CalledProcessError in line 353 of /home/shawnguo/GitWS/conceptnet5/Snakefile: Command 'cn5-db load_data data/psql && touch data/psql/done' returned non-zero exit status 1 File "/home/shawnguo/GitWS/conceptnet5/Snakefile", line 353, in __rule_load_db File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run

How does this happen?And, how to handle it?

rspeer commented 7 years ago

Set the environment variables CONCEPTNET_DB_USER and CONCEPTNET_DB_PASSWORD to the username and password that let you connect to your PostgreSQL database.

Shawn-Guo-CN commented 7 years ago

Thanks for the reply. But I now got another error. The whole trace back information is as follow: rule miniaturize: input: data/vectors/numberbatch.h5, data/vectors/w2v-google-news.h5 output: data/vectors/mini.h5 jobid: 3 resources: ram=4

Traceback (most recent call last): File "/home/shawnguo/.local/lib/python3.5/site-packages/wordfreq/init.py", line 273, in word_frequency return _wf_cache[args] KeyError: ('##', 'ja', 'combined', 0.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/shawnguo/.local/bin/cn5-vectors", line 11, in load_entry_point('ConceptNet', 'console_scripts', 'cn5-vectors')() File "/home/shawnguo/.local/lib/python3.5/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/home/shawnguo/.local/lib/python3.5/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/shawnguo/.local/lib/python3.5/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/shawnguo/.local/lib/python3.5/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/shawnguo/.local/lib/python3.5/site-packages/click/core.py", line 535, in invoke return callback(args, *kwargs) File "/home/shawnguo/GitWS/conceptnet5/conceptnet5/vectors/cli.py", line 168, in run_miniaturize mini = miniaturize(frame, other_vocab=othervocab, k=k) File "/home/shawnguo/GitWS/conceptnet5/conceptnet5/vectors/transforms.py", line 96, in miniaturize vocab1 = [term for term in frame.index if '' not in term File "/home/shawnguo/GitWS/conceptnet5/conceptnet5/vectors/transforms.py", line 97, in and term.startswith(prefix) and term_freq(term) > 0.] File "/home/shawnguo/GitWS/conceptnet5/conceptnet5/vectors/transforms.py", line 85, in term_freq return wordfreq.word_frequency(term, lang) File "/home/shawnguo/.local/lib/python3.5/site-packages/wordfreq/init.py", line 277, in word_frequency _wf_cache[args] = _word_frequency(args) File "/home/shawnguo/.local/lib/python3.5/site-packages/wordfreq/init.py", line 219, in _word_frequency tokens = tokenize(word, lang) File "/home/shawnguo/.local/lib/python3.5/site-packages/wordfreq/tokens.py", line 314, in tokenize return tokenize_mecab_language(text, lang, include_punctuation) File "/home/shawnguo/.local/lib/python3.5/site-packages/wordfreq/tokens.py", line 153, in tokenize_mecab_language from wordfreq.mecab import mecab_tokenize File "/home/shawnguo/.local/lib/python3.5/site-packages/wordfreq/mecab.py", line 2, in import MeCab ImportError: No module named 'MeCab' Error in job miniaturize while creating output file data/vectors/mini.h5. RuleException: CalledProcessError in line 540 of /home/shawnguo/GitWS/conceptnet5/Snakefile: Command 'cn5-vectors miniaturize data/vectors/numberbatch.h5 data/vectors/w2v-google-news.h5 data/vectors/mini.h5' returned non-zero exit status 1 File "/home/shawnguo/GitWS/conceptnet5/Snakefile", line 540, in __rule_miniaturize File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run Will exit after finishing currently running jobs. Finished job 17. 123 of 136 steps (90%) done Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message

What's wrong this time? I'd be really grateful if you could help.

Shawn-Guo-CN commented 7 years ago

How to set the environment variables CONCEPTNET_DB_USER and CONCEPTNET_DB_PASSWORD? I didn't set password for the db user "shawnguo". I can access the db by using psql in my shell and no password is required.

rspeer commented 7 years ago

The error you got above comes from trying to decide which Japanese words and phrases to put in the combined vector space, which involves dependencies that I forgot to describe. It needs MeCab, the Japanese tokenizer.

I've pushed an update that will get the Python side of those dependencies, and updated the https://github.com/commonsense/conceptnet5/wiki/Build-process page with what else you need to do. In particular, you need to install libmecab-dev and mecab-ipadic-utf8.

rspeer commented 7 years ago

As for access to the database: The psql command uses a kind of connection that Python can't use, and sometimes Postgres is configured to allow just that kind of connection without a password. I added a link to the wiki page that explains how to allow all local connections without a password: https://gist.github.com/p1nox/4953113

Shawn-Guo-CN commented 7 years ago

Well, thank you very much. I still have one question, do I really have to install python package MeCab from source? I didn't figure out how to install it via pip.

rspeer commented 7 years ago

Ah, the package is named 'mecab-python3', despite that you import it as 'MeCab'. I pushed an update to the conceptnet5 repo that puts it in the dependencies when you want to build vectors.