Closed ahvahsky2008 closed 4 years ago
What's in your models.tsv
?
@akutuzov
identifier name path string default tags algo size ruscorpora_upos_skipgram_300_5_2018 Russian National Corpus /var/www/model/model.bin similar4 False True word2vec 250000000
i try models.bin, models.txt - same issue
Ah, I see now. The model format was incorrectly recognized. Update your WebVectors version and try again, I've just fixed this in https://github.com/akutuzov/webvectors/commit/6a558ff0eb59093feb13833591a388af876f90a9
Please report whether the problem is gone.
@akutuzov thx, one problem solved. But server not started fully when i open url http://xxx.xxx.xxx.xxx:8088/ its show errors
There seems to be two different issues. Are you getting errors right after you open the service in a web browser, even before you actually send a query word? What errors?
FIrstly i run service
python3.7 word2vec_server.py
when i open url in browser x.x.x.x:8088
in console i see errors
Model ruscorpora_upos_skipgram_300_5_2018 from file /var/www/model/model.bin loaded successfully.
Socket created
Socket bind complete
Socket now listening on port 8088
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "word2vec_server.py", line 22, in run
clientthread(self.connect, self.address)
File "word2vec_server.py", line 35, in clientthread
query = json.loads(data.decode('utf-8'))
File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Why are you trying to send HTTP requests from a browser to the word2vec_server
port?
You should run a proper HTTP server (either apache
or gunicorn
), as described in the WebVectors readme ('Running WebVectors' section).
We generally recommend gunicorn
.
@akutuzov sorry for tupnyak))
when i try click "Найти похожие слова" its freezes and unicorn show error
word2_vec works normally
https://github.com/ahvahsky2008/test my configs here
[2020-02-27 08:31:34 +0100] [22399] [DEBUG] Current configuration:
config: None
bind: ['0.0.0.0:9999']
backlog: 2048
workers: 1
worker_class: sync
threads: 1
worker_connections: 1000
max_requests: 0
max_requests_jitter: 0
timeout: 30
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /var/www/webvectors
daemon: False
raw_env: []
pidfile: None
worker_tmp_dir: None
user: 0
group: 0
umask: 0
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: gunicorn.log
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: gunicorn.error.log
loglevel: debug
capture_output: True
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
syslog_addr: udp://localhost:514
syslog: False
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: run_syn:app_syn
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0x7ff4231860e0>
on_reload: <function OnReload.on_reload at 0x7ff423186200>
when_ready: <function WhenReady.when_ready at 0x7ff423186320>
pre_fork: <function Prefork.pre_fork at 0x7ff423186440>
post_fork: <function Postfork.post_fork at 0x7ff423186560>
post_worker_init: <function PostWorkerInit.post_worker_init at 0x7ff423186680>
worker_int: <function WorkerInt.worker_int at 0x7ff4231867a0>
worker_abort: <function WorkerAbort.worker_abort at 0x7ff4231868c0>
pre_exec: <function PreExec.pre_exec at 0x7ff4231869e0>
pre_request: <function PreRequest.pre_request at 0x7ff423186b00>
post_request: <function PostRequest.post_request at 0x7ff423186b90>
child_exit: <function ChildExit.child_exit at 0x7ff423186cb0>
worker_exit: <function WorkerExit.worker_exit at 0x7ff423186dd0>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7ff423186ef0>
on_exit: <function OnExit.on_exit at 0x7ff42318b050>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: None
certfile: None
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2020-02-27 08:31:34 +0100] [22399] [INFO] Starting gunicorn 20.0.4
[2020-02-27 08:31:34 +0100] [22399] [DEBUG] Arbiter booted
[2020-02-27 08:31:34 +0100] [22399] [INFO] Listening at: http://0.0.0.0:9999 (22399)
[2020-02-27 08:31:34 +0100] [22399] [INFO] Using worker: sync
[2020-02-27 08:31:34 +0100] [22402] [INFO] Booting worker with pid: 22402
[2020-02-27 08:31:34 +0100] [22399] [DEBUG] 1 workers
[2020-02-27 08:31:41 +0100] [22402] [DEBUG] GET /en/associates/
[2020-02-27 08:31:42 +0100] [22402] [DEBUG] GET /en/associates/YOUR_URL/example_vocab.json
[2020-02-27 08:31:44 +0100] [22402] [DEBUG] POST /en/associates/
[2020-02-27 08:32:14 +0100] [22399] [CRITICAL] WORKER TIMEOUT (pid:22402)
[2020-02-27 08:32:14 +0100] [22402] [INFO] Worker exiting (pid: 22402)
[2020-02-27 08:32:15 +0100] [22418] [INFO] Booting worker with pid: 22418
[2020-02-27 08:32:16 +0100] [22418] [DEBUG] POST /
[2020-02-27 08:32:16 +0100] [22418] [DEBUG] Ignoring EPIPE
Андрей, подскажите плиз( Уже несколько дней воюю
https://github.com/ahvahsky2008/test my configs here
I see that you changed detect_tag
to True
. If you want the service to automatically detect the query part of speech, you should make sure to run a properly configured UDPipe or Stanford CoreNLP server (more details in https://github.com/akutuzov/webvectors/blob/master/lemmatizer.py).
It looks like you did not do this. That's the reason your WebVectors instance attempts to access the (non-existent) tagger service, and eventually timeouts.
If you don't want to setup a tagger service, simply return the detect_tag
field in the config file to its default `False' state.
request works, but its not return data
request works, but its not return data
It does return data: it says that the word "пёс" is not present in the model vocabulary. Which is entirely true, because the ruwikiruscorpora_upos_skipgram_300_2_2019
model contains words with PoS tags. Thus, you should query for "пёс_NOUN", not for simple "пёс".
@akutuzov thx!! and one question this functional not working with this model 2) How insert data without NOUN, VERB and others tags. Make its automatically
1) All the WebVectors functions (including those in the Visualizations, Calculator and Miscellaneous tabs) work with any word embedding model. If something goes wrong for you, please report it with all the details (what do you expect to see, what you actually see, are there any error messages), preferably in a separate issue. 2) If you prefer to use embedding models which feature words without PoS tags, then simply download such a model. For example, all our fastText models are trained on corpora without PoS tags. You can find many more models with or without tags in the NLPL Vector Repository.
1) lets take first problems.
i simply want make https://rusvectores.org/ analogue
Your error message screenshot is not full (the bottom of the screen is seemingly cropped). Please provide the complete error message.
Aside of that, why your queries are accompanied with this strange _NONE
tag? None of our models contains words with such a tag.
i simply want make https://rusvectores.org/ analogue
If you really want a full analogue, you will have to install the UDPipe server to perform automatic PoS tagging of user queries.
which algo file i need download for udpipe?
UDPipe is a tagger, see https://ufal.mff.cuni.cz/udpipe. You can download the UDPipe models from there, or use our custom model (the link to it can be found in our tutorial). In general, I highly recommend you to go through the tutorial, it has answers for many of your questions.
Ok. 1) Firstly i install udpipe_server. 2) change configs 3) when i try search word its raise error
2020-02-29 07:11:10,965 : ERROR : Exception on /en/ [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.7/dist-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/var/www/webvectors/webvectors.py", line 232, in home
query = process_query(list_data)
File "/var/www/webvectors/webvectors.py", line 153, in process_query
poses = tagword(userquery) # We tag using Stanford CoreNLP
File "/var/www/webvectors/lemmatizer.py", line 43, in tagword
tagged = json.loads(corenlp.decode('utf-8'), strict=False)
File "/usr/lib/python3.7/json/__init__.py", line 361, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Use the tag_ud() function, not tag_word().
As can be seen in their respective doctags, tagword()
is for Stanford CoreNLP tagger.
With the UDPipe, one should use tag_ud()
Можно я в телеграм или куда нить напишу плиз(
I downloaded ruwikiruscorpora_upos_skipgram_300_2_2019, extracted it and try add to webvectors.
whats incorrect in my configs?