akutuzov / webvectors

Web-ify your word2vec: framework to serve distributional semantic models online
http://vectors.nlpl.eu/explore/embeddings/
GNU General Public License v3.0
197 stars 48 forks source link

How run project with model? #48

Closed ahvahsky2008 closed 4 years ago

ahvahsky2008 commented 4 years ago

I downloaded ruwikiruscorpora_upos_skipgram_300_2_2019, extracted it and try add to webvectors. image image image image image

whats incorrect in my configs?

akutuzov commented 4 years ago

What's in your models.tsv?

ahvahsky2008 commented 4 years ago

image @akutuzov

ahvahsky2008 commented 4 years ago

identifier name path string default tags algo size ruscorpora_upos_skipgram_300_5_2018 Russian National Corpus /var/www/model/model.bin similar4 False True word2vec 250000000

ahvahsky2008 commented 4 years ago

i try models.bin, models.txt - same issue

akutuzov commented 4 years ago

Ah, I see now. The model format was incorrectly recognized. Update your WebVectors version and try again, I've just fixed this in https://github.com/akutuzov/webvectors/commit/6a558ff0eb59093feb13833591a388af876f90a9

Please report whether the problem is gone.

ahvahsky2008 commented 4 years ago

@akutuzov thx, one problem solved. But server not started fully image image when i open url http://xxx.xxx.xxx.xxx:8088/ its show errors

akutuzov commented 4 years ago

There seems to be two different issues. Are you getting errors right after you open the service in a web browser, even before you actually send a query word? What errors?

ahvahsky2008 commented 4 years ago

FIrstly i run service python3.7 word2vec_server.py when i open url in browser x.x.x.x:8088 in console i see errors

Model ruscorpora_upos_skipgram_300_5_2018 from file /var/www/model/model.bin loaded successfully.
Socket created
Socket bind complete
Socket now listening on port 8088
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "word2vec_server.py", line 22, in run
    clientthread(self.connect, self.address)
  File "word2vec_server.py", line 35, in clientthread
    query = json.loads(data.decode('utf-8'))
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
akutuzov commented 4 years ago

Why are you trying to send HTTP requests from a browser to the word2vec_server port? You should run a proper HTTP server (either apache or gunicorn), as described in the WebVectors readme ('Running WebVectors' section). We generally recommend gunicorn.

ahvahsky2008 commented 4 years ago

@akutuzov sorry for tupnyak))

image

when i try click "Найти похожие слова" its freezes and unicorn show error image

word2_vec works normally image

ahvahsky2008 commented 4 years ago

https://github.com/ahvahsky2008/test my configs here

ahvahsky2008 commented 4 years ago
[2020-02-27 08:31:34 +0100] [22399] [DEBUG] Current configuration:
  config: None
  bind: ['0.0.0.0:9999']
  backlog: 2048
  workers: 1
  worker_class: sync
  threads: 1
  worker_connections: 1000
  max_requests: 0
  max_requests_jitter: 0
  timeout: 30
  graceful_timeout: 30
  keepalive: 2
  limit_request_line: 4094
  limit_request_fields: 100
  limit_request_field_size: 8190
  reload: False
  reload_engine: auto
  reload_extra_files: []
  spew: False
  check_config: False
  preload_app: False
  sendfile: None
  reuse_port: False
  chdir: /var/www/webvectors
  daemon: False
  raw_env: []
  pidfile: None
  worker_tmp_dir: None
  user: 0
  group: 0
  umask: 0
  initgroups: False
  tmp_upload_dir: None
  secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
  forwarded_allow_ips: ['127.0.0.1']
  accesslog: gunicorn.log
  disable_redirect_access_to_syslog: False
  access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
  errorlog: gunicorn.error.log
  loglevel: debug
  capture_output: True
  logger_class: gunicorn.glogging.Logger
  logconfig: None
  logconfig_dict: {}
  syslog_addr: udp://localhost:514
  syslog: False
  syslog_prefix: None
  syslog_facility: user
  enable_stdio_inheritance: False
  statsd_host: None
  dogstatsd_tags: 
  statsd_prefix: 
  proc_name: None
  default_proc_name: run_syn:app_syn
  pythonpath: None
  paste: None
  on_starting: <function OnStarting.on_starting at 0x7ff4231860e0>
  on_reload: <function OnReload.on_reload at 0x7ff423186200>
  when_ready: <function WhenReady.when_ready at 0x7ff423186320>
  pre_fork: <function Prefork.pre_fork at 0x7ff423186440>
  post_fork: <function Postfork.post_fork at 0x7ff423186560>
  post_worker_init: <function PostWorkerInit.post_worker_init at 0x7ff423186680>
  worker_int: <function WorkerInt.worker_int at 0x7ff4231867a0>
  worker_abort: <function WorkerAbort.worker_abort at 0x7ff4231868c0>
  pre_exec: <function PreExec.pre_exec at 0x7ff4231869e0>
  pre_request: <function PreRequest.pre_request at 0x7ff423186b00>
  post_request: <function PostRequest.post_request at 0x7ff423186b90>
  child_exit: <function ChildExit.child_exit at 0x7ff423186cb0>
  worker_exit: <function WorkerExit.worker_exit at 0x7ff423186dd0>
  nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7ff423186ef0>
  on_exit: <function OnExit.on_exit at 0x7ff42318b050>
  proxy_protocol: False
  proxy_allow_ips: ['127.0.0.1']
  keyfile: None
  certfile: None
  ssl_version: 2
  cert_reqs: 0
  ca_certs: None
  suppress_ragged_eofs: True
  do_handshake_on_connect: False
  ciphers: None
  raw_paste_global_conf: []
  strip_header_spaces: False
[2020-02-27 08:31:34 +0100] [22399] [INFO] Starting gunicorn 20.0.4
[2020-02-27 08:31:34 +0100] [22399] [DEBUG] Arbiter booted
[2020-02-27 08:31:34 +0100] [22399] [INFO] Listening at: http://0.0.0.0:9999 (22399)
[2020-02-27 08:31:34 +0100] [22399] [INFO] Using worker: sync
[2020-02-27 08:31:34 +0100] [22402] [INFO] Booting worker with pid: 22402
[2020-02-27 08:31:34 +0100] [22399] [DEBUG] 1 workers
[2020-02-27 08:31:41 +0100] [22402] [DEBUG] GET /en/associates/
[2020-02-27 08:31:42 +0100] [22402] [DEBUG] GET /en/associates/YOUR_URL/example_vocab.json
[2020-02-27 08:31:44 +0100] [22402] [DEBUG] POST /en/associates/
[2020-02-27 08:32:14 +0100] [22399] [CRITICAL] WORKER TIMEOUT (pid:22402)
[2020-02-27 08:32:14 +0100] [22402] [INFO] Worker exiting (pid: 22402)
[2020-02-27 08:32:15 +0100] [22418] [INFO] Booting worker with pid: 22418
[2020-02-27 08:32:16 +0100] [22418] [DEBUG] POST /
[2020-02-27 08:32:16 +0100] [22418] [DEBUG] Ignoring EPIPE
ahvahsky2008 commented 4 years ago

Андрей, подскажите плиз( Уже несколько дней воюю

akutuzov commented 4 years ago

https://github.com/ahvahsky2008/test my configs here

I see that you changed detect_tag to True. If you want the service to automatically detect the query part of speech, you should make sure to run a properly configured UDPipe or Stanford CoreNLP server (more details in https://github.com/akutuzov/webvectors/blob/master/lemmatizer.py). It looks like you did not do this. That's the reason your WebVectors instance attempts to access the (non-existent) tagger service, and eventually timeouts.

If you don't want to setup a tagger service, simply return the detect_tag field in the config file to its default `False' state.

ahvahsky2008 commented 4 years ago

image image image

ahvahsky2008 commented 4 years ago

request works, but its not return data

akutuzov commented 4 years ago

request works, but its not return data

It does return data: it says that the word "пёс" is not present in the model vocabulary. Which is entirely true, because the ruwikiruscorpora_upos_skipgram_300_2_2019 model contains words with PoS tags. Thus, you should query for "пёс_NOUN", not for simple "пёс".

ahvahsky2008 commented 4 years ago

@akutuzov thx!! and one question image this functional not working with this model 2) How insert data without NOUN, VERB and others tags. Make its automatically

akutuzov commented 4 years ago

1) All the WebVectors functions (including those in the Visualizations, Calculator and Miscellaneous tabs) work with any word embedding model. If something goes wrong for you, please report it with all the details (what do you expect to see, what you actually see, are there any error messages), preferably in a separate issue. 2) If you prefer to use embedding models which feature words without PoS tags, then simply download such a model. For example, all our fastText models are trained on corpora without PoS tags. You can find many more models with or without tags in the NLPL Vector Repository.

ahvahsky2008 commented 4 years ago

1) lets take first problems. image

image

ahvahsky2008 commented 4 years ago

i simply want make https://rusvectores.org/ analogue

akutuzov commented 4 years ago

Your error message screenshot is not full (the bottom of the screen is seemingly cropped). Please provide the complete error message. Aside of that, why your queries are accompanied with this strange _NONE tag? None of our models contains words with such a tag.

akutuzov commented 4 years ago

i simply want make https://rusvectores.org/ analogue

If you really want a full analogue, you will have to install the UDPipe server to perform automatic PoS tagging of user queries.

ahvahsky2008 commented 4 years ago

image which algo file i need download for udpipe?

akutuzov commented 4 years ago

UDPipe is a tagger, see https://ufal.mff.cuni.cz/udpipe. You can download the UDPipe models from there, or use our custom model (the link to it can be found in our tutorial). In general, I highly recommend you to go through the tutorial, it has answers for many of your questions.

ahvahsky2008 commented 4 years ago

Ok. 1) Firstly i install udpipe_server. image 2) change configs image 3) when i try search word its raise error

2020-02-29 07:11:10,965 : ERROR : Exception on /en/ [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.7/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/var/www/webvectors/webvectors.py", line 232, in home
    query = process_query(list_data)
  File "/var/www/webvectors/webvectors.py", line 153, in process_query
    poses = tagword(userquery)  # We tag using Stanford CoreNLP
  File "/var/www/webvectors/lemmatizer.py", line 43, in tagword
    tagged = json.loads(corenlp.decode('utf-8'), strict=False)
  File "/usr/lib/python3.7/json/__init__.py", line 361, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
akutuzov commented 4 years ago

Use the tag_ud() function, not tag_word(). As can be seen in their respective doctags, tagword() is for Stanford CoreNLP tagger. With the UDPipe, one should use tag_ud()

ahvahsky2008 commented 4 years ago

Можно я в телеграм или куда нить напишу плиз(