commonsense / conceptnet5

Code for building ConceptNet from raw data.
Other
2.78k stars 355 forks source link

Issue running python api #85

Closed EnricoBeltramo closed 7 years ago

EnricoBeltramo commented 7 years ago

Hello, I tried to install a local copy of conceptnet (using docker) and when I use the API I have this error:

Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information.

from conceptnet5.db.query import AssertionFinder Traceback (most recent call last): File "", line 1, in File "conceptnet5/db/query.py", line 1, in from .connection import get_db_connection File "conceptnet5/db/connection.py", line 35 file=sys.stderr ^ SyntaxError: invalid syntax

Somebody can help me about this issue?

rspeer commented 7 years ago

What did you do with Docker?

The error you're showing is an error that results from running Python 3 code in Python 2. This is very strange, as Python 2 is not installed in the Docker container.

EnricoBeltramo commented 7 years ago

Thank you for answer, may be my problem is I'm not usual to docker and posgree. Just to recap:

erber@erber-VirtualBox:~/conceptnet5$ python3 Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.

from conceptnet5.db.query import AssertionFinder cnfinder = AssertionFinder() cnfinder.lookup('/c/en/example') Traceback (most recent call last): File "", line 1, in File "/home/erber/conceptnet5/conceptnet5/db/query.py", line 103, in lookup self.connection = get_db_connection(self.dbname) File "/home/erber/conceptnet5/conceptnet5/db/connection.py", line 20, in get_db_connection raise IOError("The ConceptNet database has not been built.") OSError: The ConceptNet database has not been built.

I suppose the error is because the conceptnet api don't see the database volume emulated in docker. What I have to do in order to use docker image from host pc api?

Thank you

rspeer commented 7 years ago

Ah. AssertionFinder is the Python code that directly accesses the database, not the API. It can't find the database because the database is inside the container.

The API runs over HTTP, and it's the interface that the container exposes. To access this API:

For example:

curl http://api.localhost/c/en/example

If you want to interact with the Python code, then you should build ConceptNet in your Python 3 environment without Docker, as described at https://github.com/commonsense/conceptnet5/wiki/Build-process . I ask people to try Docker first because it takes care of dependencies for you, as well as configuring Postgres and the Web server.

By the way, to talk about code or output on GitHub, you should surround it with triple-backticks so that it keeps the formatting:

```
Your code goes here
```
EnricoBeltramo commented 7 years ago

Thank, I also tried in past api.localhost, but unfortunately the interface is very slow and I would to use the api in order to have a more direct access to data.

I tried to build locally the data, but after a lot of activity, I have follow error:

Traceback (most recent call last):
  File "/home/erber/.local/bin/cn5-vectors", line 9, in <module>
    load_entry_point('ConceptNet', 'console_scripts', 'cn5-vectors')()
  File "/usr/local/lib/python3.5/dist-packages/click-6.7-py3.5.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/click-6.7-py3.5.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.5/dist-packages/click-6.7-py3.5.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.5/dist-packages/click-6.7-py3.5.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.5/dist-packages/click-6.7-py3.5.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/erber/conceptnet5/conceptnet5/vectors/cli.py", line 76, in run_convert_word2vec
    convert_word2vec(word2vec_filename, output_filename, nrows)
  File "/home/erber/conceptnet5/conceptnet5/vectors/formats.py", line 115, in convert_word2vec
    w2v_std = standardize_row_labels(w2v_raw, forms=False, language=language)
  File "/home/erber/conceptnet5/conceptnet5/vectors/transforms.py", line 26, in standardize_row_labels
    relabeled = frame.mul(weights, axis='rows').sort_index().groupby(level=0).sum()
  File "/home/erber/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 3357, in sort_index
    convert=False, verify=False)
  File "/home/erber/.local/lib/python3.5/site-packages/pandas/core/internals.py", line 3964, in take
    axis=axis, allow_dups=True)
  File "/home/erber/.local/lib/python3.5/site-packages/pandas/core/internals.py", line 3850, in reindex_indexer
    for blk in self.blocks]
  File "/home/erber/.local/lib/python3.5/site-packages/pandas/core/internals.py", line 3850, in <listcomp>
    for blk in self.blocks]
  File "/home/erber/.local/lib/python3.5/site-packages/pandas/core/internals.py", line 1022, in take_nd
    allow_fill=True, fill_value=fill_value)
  File "/home/erber/.local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 1100, in take_nd
    out = np.empty(out_shape, dtype=dtype)
MemoryError
Error in job convert_word2vec while creating output file data/vectors/w2v-google-news.h5.
RuleException:
CalledProcessError in line 476 of /home/erber/conceptnet5/Snakefile:
Command 'CONCEPTNET_DATA=data cn5-vectors convert_word2vec -n 1500000 data/raw/vectors/GoogleNews-vectors-negative300.bin.gz data/vectors/w2v-google-news.h5' returned non-zero exit status 1
  File "/home/erber/conceptnet5/Snakefile", line 476, in __rule_convert_word2vec
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
rspeer commented 7 years ago

The key part there is the MemoryError.

How many GB of RAM do you have available? If it's around 16 GB, then the ConceptNet 5.5.4 update I just pushed will help you -- it should do more to avoid running multiple memory-hungry processes at the same time. Running pip3 install --upgrade snakemake may also help, as older versions of Snakemake were ignoring those instructions.

If it's less than 16 GB, there's no hope of being able to build the vectors; sorry. The right plan may be to disable that part of the build and just use precomputed vectors.

mrmechko commented 7 years ago

Just leaving my docker workaround here in case it helps someone:

I exposed api.localhost on port 1337 for easy access to the api running within docker and the db on port 5432. With a small modification to the python code and a few environment variables, I can run the native python api while running conceptnet5 in docker. I took this approach because I'm running on OS X and many of the dependencies for the system require manual configuration which seemed unnecessary. Haven't run into any performance issues as of yet.

modified code is at https://github.com/mrmechko/conceptnet5/tree/version5.5

rspeer commented 7 years ago

Nice. I think I would like to incorporate something like that. All I'd really need to do is move the port numbers to higher numbers that aren't likely to be contentious.

To be clear: when you set it up this way, you can use modules like conceptnet5.db.query outside of Docker, and have them use the DB that was built inside Docker, right?

EnricoBeltramo commented 7 years ago

Thank you, looks to be a good solution: I will try immediately In the while, because my PC looks not to be enough powerful to make all the build I did a little workaround: I copied the full database and files from docker volumes in my local pc. In this way, now the api works fine. Unfortunately they are still a bit too slow for my use. Just there are a list of available commands and argument for the api? I took a look in the wiki, but I find only commands for webapi.

Greets,

rspeer commented 7 years ago

As I was saying before, conceptnet5.db.query isn't really an API, it's just a part of the ConceptNet code. I didn't design it as a well-specified thing for other people to use. If I significantly change the underlying ConceptNet database, my aim will be to keep the Web API the same as much as possible, even though the entirety of conceptnet.db might change, as it did in 5.2, 5.3, and 5.4.

What is the code doing for you that the API wasn't?

mrmechko commented 7 years ago

@rspeer that's exactly right. I can now use conceptnet5.db.query from outside Docker. Initially I actually couldn't figure out how to expose the web api through Docker (new to this stuff). I don't think it provides anything that the web api itself doesn't.

One small thing I've noticed is that there doesn't seem to be any pagination directly from python (or maybe I haven't inspected any large enough queries).

EnricoBeltramo commented 7 years ago

@rspeer Sure: the web interface is fully complete. My question was only because I noticed that the queries are very slow on my PC and I figured that using directly the api I should have a little improvement of speed. Anyway I'm already inspecting the api in order to understand all the dababase relations. Thank you very much for your support!

rspeer commented 7 years ago

I'd like to emphasize once again that conceptnet5.db.query is not an API, but I can see there are reasons why the Docker container should make the DB available on some port outside the container, so I'll implement that suggestion in a later release.