dice-group / embeddings.cc

Universal Knowledge Graph Embeddings
2 stars 1 forks source link

elastic_transport.ConnectionTimeout #24

Closed adibaba closed 2 years ago

adibaba commented 2 years ago
[2022-03-11 09:59:52,772] ERROR in app: Exception on / [POST]
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.9/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/anaconda3/lib/python3.9/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/anaconda3/lib/python3.9/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/anaconda3/lib/python3.9/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/anaconda3/lib/python3.9/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/anaconda3/lib/python3.9/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "./webservice_public/__init__.py", line 190, in index
    for tup in es.get_similar_embeddings(get_index(),
  File "./webservice_public/es.py", line 92, in get_similar_embeddings
    response = get_es().msearch(body=request)
  File "/opt/anaconda3/lib/python3.9/site-packages/elasticsearch/_sync/client/utils.py", line 404, in wrapped
    return api(*args, **kwargs)
  File "/opt/anaconda3/lib/python3.9/site-packages/elasticsearch/_sync/client/__init__.py", line 2563, in msearch
    return self.perform_request(  # type: ignore[return-value]
  File "/opt/anaconda3/lib/python3.9/site-packages/elasticsearch/_sync/client/_base.py", line 286, in perform_request
    meta, resp_body = self.transport.perform_request(
  File "/opt/anaconda3/lib/python3.9/site-packages/elastic_transport/_transport.py", line 329, in perform_request
    meta, raw_data = node.perform_request(
  File "/opt/anaconda3/lib/python3.9/site-packages/elastic_transport/_node/_http_urllib3.py", line 177, in perform_request
    raise err from None
elastic_transport.ConnectionTimeout: Connection timed out
[pid: 545919|app: 0|req: 726/726] xxx () {58 vars in 971 bytes} [Fri Mar 11 09:59:42 2022] POST / => generated 290 bytes in 10025 msecs (HTTP/1.1 500) 2 headers in 99 bytes (1 switches on core 0)
adibaba commented 2 years ago
adibaba commented 2 years ago

test: increased timeout from 10 to 60 seconds https://elasticsearch-py.readthedocs.io/en/v7.16.0/connection.html#elasticsearch.Urllib3HttpConnection -> also not working

adibaba commented 2 years ago

https://cloud.google.com/architecture/building-real-time-embeddings-similarity-matching-system

-> perform approximate similarity matching

adibaba commented 2 years ago

Script score query - dense_vector functions: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-script-score-query.html#_dense_vector_functions

adibaba commented 2 years ago

Text similarity search with vector fields https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch

To avoid scanning over all documents and to maintain fast performance, the match_all query can be replaced with a more selective query.

But: embeddings are probably the largest field.

adibaba commented 2 years ago

TODO: Check CPU and memory usage on that task

heindorf commented 2 years ago

Is there something like BallTree or KD Tree available in Elastic Search? (compare https://scikit-learn.org/stable/modules/neighbors.html?highlight=knn%20nearest%20neighbor)

adibaba commented 2 years ago

Updated system config

https://www.elastic.co/guide/en/elasticsearch/reference/current/setting-system-settings.html

ulimit
 unlimited
-> ok

https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html

cat /proc/sys/vm/swappiness
 60
sudo nano /proc/sys/vm/swappiness
 1
sudo sysctl -p
adibaba commented 2 years ago

Very strange: CPU and memory stay at 0% during execution

adibaba commented 2 years ago

https://www.elastic.co/guide/en/elasticsearch/reference/7.16/advanced-configuration.html#set-jvm-heap-size https://discuss.elastic.co/t/elasticsearch-timeout-issues/134011/2

sudo cat /etc/elasticsearch/jvm.options
## create a new file in the jvm.options.d directory containing these lines:
##
## -Xms4g
## -Xmx4g
sudo nano /etc/elasticsearch/jvm.options.d/jvm.options
-Xms8g
-Xmx8g

sudo nano /etc/elasticsearch/elasticsearch.yml
# commented out:
bootstrap.memory_lock: true

sudo systemctl restart elasticsearch.service

tested several values. 16g did not change performance. cpu and mem still not used much at top command

adibaba commented 2 years ago

Memo: single query takes 3.88 / 3.52 / 3.56 seconds.

Will try reindex from 5 shards. To be under 1 second, it would be 20 shards (linear). 0.5 seconds would be 40 shards.

adibaba commented 2 years ago

Did a reindex 5 -> 40

real    166m40,433s
user    0m0,265s
sys     0m0,050s

Result: New index results in a timeout. Old index is at old/similar times.

Memory commands:

free
ps -o pid,user,%mem,command ax | sort -b -k3 -r
sudo ps -o pid,user,%mem,command ax | sort -b -k3 -r

Memory check:

wilke@embeddings:/$ sudo systemctl stop elasticsearch.service
wilke@embeddings:/$ free 
               total        used        free      shared  buff/cache   available
Mem:            31Gi       359Mi       9,0Gi        25Mi        21Gi        30Gi

sudo systemctl start elasticsearch.service
wilke@embeddings:/$ free 
               total        used        free      shared  buff/cache   available
Mem:            31Gi       8,9Gi       281Mi        91Mi        22Gi        21Gi
Swap:          2,0Gi       0,0Ki       2,0Gi

Memory check during similarity calculation of old index:

ps -o pid,user,%mem,command ax | sort -b -k3 -r
    PID USER     %MEM COMMAND
   1582 wilke     0.1 uwsgi --plugin python3 -H /opt/anaconda3 --mount /=webservice_public/wsgi.py --socket /tmp/embeddingscc.sock --chmod-socket=666 --thunder-lock --enable-threads
  10710 wilke     0.1 /opt/anaconda3/envs/embeddings/bin/python3.10 /opt/anaconda3/envs/embeddings/bin/flask run --host=0.0.0.0
  39033 wilke     0.0 sort -b -k3 -r
   1561 wilke     0.0 SCREEN -S webservice-public
  10691 wilke     0.0 SCREEN -S webservice-index
  12526 wilke     0.0 SCREEN -S reindex
  38962 wilke     0.0 screen -r reindex
  39032 wilke     0.0 ps -o pid,user,%mem,command ax
   1427 wilke     0.0 /lib/systemd/systemd --user
   1562 wilke     0.0 /bin/bash
  12527 wilke     0.0 /bin/bash
  10692 wilke     0.0 /bin/bash
  36679 wilke     0.0 -bash
sudo ps -o pid,user,%mem,command ax | sort -b -k3 -r > /tmp/tmp_mem_aw.txt

    PID USER     %MEM COMMAND
  37210 elastic+ 81.7 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=tru
e -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHook
Enabled=false -Dlog4j2.disable.jmx=true -Dlog4j2.formatMsgNoLookups=true -Djava.locale.providers=SPI,COMPAT --add-opens=java.base/java.io=ALL-UNNAMED -XX:+UseG1GC -Djava.io.tmpdir=/tmp/elasticsearch-16131451313185387900 -XX:+HeapDumpOnOu
tOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,fi
lesize=64m -Xms8g -Xmx8g -XX:MaxDirectMemorySize=4294967296 -XX:InitiatingHeapOccupancyPercent=30 -XX:G1ReservePercent=25 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=default -Des.dis
tribution.type=deb -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
   1582 wilke     0.1 uwsgi --plugin python3 -H /opt/anaconda3 --mount /=webservice_public/wsgi.py --socket /tmp/embeddingscc.sock --chmod-socket=666 --thunder-lock --enable-threads
  10710 wilke     0.1 /opt/anaconda3/envs/embeddings/bin/python3.10 /opt/anaconda3/envs/embeddings/bin/flask run --host=0.0.0.0
    118 root      0.0 [zswap-shrink]
    284 root      0.0 [xprtiod]
     40 root      0.0 [writeback]
    177 root      0.0 [vmw_pvscsi_wq_0]
  37403 elastic+  0.0 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
    644 root      0.0 /usr/sbin/sssd -i --logger=journald
    634 root      0.0 /usr/sbin/rsyslogd -n -iNONE
    532 root      0.0 /usr/sbin/rpc.svcgssd
    536 root      0.0 /usr/sbin/rpc.gssd -R CS.UNI-PADERBORN.DE
    950 root      0.0 /usr/sbin/cron -f

(in parallel question at discuss.elastic.co)

adibaba commented 2 years ago

https://www.elastic.co/guide/en/elasticsearch/reference/7.16/modules-scripting-using.html#script-stored-scripts

You can store and retrieve scripts from the cluster state using the stored script APIs. Stored scripts reduce compilation time and make searches faster.

adibaba commented 2 years ago

Interesting in general: https://www.elastic.co/de/blog/advanced-tuning-finding-and-fixing-slow-elasticsearch-queries

adibaba commented 2 years ago
adibaba commented 2 years ago

Opened port inside network

sudo nano /etc/elasticsearch/elasticsearch.yml
network.host: 0.0.0.0
sudo systemctl restart elasticsearch.service

Executed query directly using ES API:

time curl -X GET "elastic:xxx@embeddings.cs.upb.de:9200/caligraph_dbpedia_procrustes/_search?from=40&size=20&pretty" -H 'Content-Type: application/json' -d'
{
"query": {
            "script_score": {
                "query": {"match_all": {}},
                "script": {
                    "id": "cossim",
                    "params": {
                        "query_vector": [0.030565933793138402, -0.028540779218609362, 0.006805077006915239, -0.017746525505774744, 0.0032132199308759525, 0.007233056779485464, -0.0042593651969159525, 0.017676684839347444, -0.044577697446058925, -0.0153459798752109, -0.03208148633981944, 0.017366536551927544, -0.0025245695212026235, -0.07278815030213133, -0.025120957783991316, -0.052534049720601515, 0.025179516784930155, -0.029468957839177076, -0.0004255946240163556, -0.036653691244030846, -0.030255760474281693, 0.026100778662148962, 0.022448519323101566, 0.0750670326965136, 0.0032200257378672825, 0.004937126625464454, 0.03441954030631157, -0.04137270440719711, -0.01317738283518366, 0.03473983161072258, -0.08286848785612906, -0.09534694419492687, 0.023494774632933545, 0.053981429907849814, 0.015004841704088374, -0.03199286803282822, 0.03494212683264794, 0.0021001529736614996, 0.0647847599338521, 0.02749638445895638, -0.02386000542759517, -0.025486053346738913, -0.012867585082780488, -0.014370990797724765, 0.000837698836000416, -0.10793377235015855, 0.06029804334224216, 0.07257384385901378, -0.0678651818861254, 0.025288142013945317, 0.059571115849516604, 0.03224866231840837, 0.06707322967076784, 0.0011991212236033034, 0.056477268170348136, -0.08612086113724891, -0.01856504618692556, -0.011137779402447775, -0.03326910765368229, 0.025021963553988287, -0.045822440508039554, -0.00993267781309086, 0.028020502536509482, -0.04411891308711386, 0.024263087180998247, -0.019958898254082036, -0.05898903808937916, -0.060567651427081565, -0.05579877930725936, 0.015926660786967334, -0.0386698009497002, -0.021426234404579525, -0.022689202321769936, -0.028494059963723443, 0.012249198509373393, 0.030790378443435653, 0.036889860504001525, 0.0034108190371623555, -0.004176869574504596, 0.05321481646708432, -0.04242943521836934, 0.036471177646095636, 0.02489357724827755, -0.13070149079395035, 0.117705470984372, -0.07861548807708928, -0.051910666434099625, -0.058893346337359105, 0.031529140631085056, 0.008294941268410233, 0.01726949526821499, 0.03498714140557922, -0.0419981313879081, 0.010871909011955673, -0.03323831853895136, 0.0748601241572883, 0.09977403252574515, -0.015491757421946774, -0.05119102600249916, 0.02109258496608384, 0.014650276618505514, -0.01581384150394457, 0.020635808243209278, -0.04090617614824228, -0.011013568738329933, 0.03871483845647056, -0.10034426093875778, 0.01528462167802264, 0.013632339932641138, 0.03410941032115686, -0.08729659900171516, 0.08815148168250002, 0.014282778500935095, 0.0020219647635865805, -0.021093678626030364, -0.025107970167418437, 0.023827682832983966, -0.025280732270806402, -0.015367335277249225, -0.03275558632407132, -0.006322394111101852, 0.03598733229105779, -0.03917397511466195, -0.05313606851587216, 0.03911259949065498, -0.007094419878596561, -0.07429208438672455, -0.010446203312346955, 0.028398423807988234, -0.028774016594692232, 0.021396363488156524, -0.06669493900582336, -0.04881067547960094, 0.03626696993938988, 0.032702344322922926, 0.06095245711881393, -0.03396702833079929, -0.02485231202762308, -0.06663703330487397, 0.029590565937845953, 0.02663500845904449, -0.1130833333826694, 0.09705333888161276, 0.0004869698205548917, -0.04772092263313072, -0.04723647058986761, 0.020832563659551804, 0.04860841931636296, -0.009103753205760526, -0.01401525050774551, 0.03801047722042411, -0.0792461287307537, -0.038669981010073344, -0.05874385582891507, 0.058107545943389925, 0.07236431847970648, 0.06385640319235258, 0.04871036778057973, -0.01628747500432686, -0.07354711075313078, -0.04900139026734541, -0.08313176182421064, 0.06229144473362401, 0.02774686782738733, 0.0852572513436608, -0.013427046835029571, -0.008484722498232435, 0.02130842451539948, -0.04109610264527026, -0.002064874629749967, -0.028607845597563283, 0.011571817344463355, 0.009429216542950187, 0.026356324201115933, 0.0641602928118361, -0.01617792094179235, -0.13478169045973554, -0.01826710094848543, 0.03409956490916596, -0.0038776737990578736, 0.05353262702221881, -0.01074168233333827, 0.02044342657782347, -0.0513188869206169, 0.016082361673002584, -0.004330021332627535, -0.02312646087662061, 0.0347637702451236, 0.03066792237571489, 0.04870828038694547, 0.011522314739063032, 0.10554049289059303, -0.023189882326790496, 0.12250056613072244, 0.03806863836590828, -0.07678908921458417, 0.0056639027649749195, 0.005890780095331304, 0.010550176147336814, -0.08682980985573055]
                    }
                }
            }
        }
}
'

0,01s user 0,03s system 1% cpu 3,437 total

--> It is an ES issue, not flask or nginx

adibaba commented 2 years ago

will be handeled in https://github.com/dice-group/embeddings.cc/issues/29