Build a Scalable Question Answering System is an important tutorial, because it guides users to transition from InMemoryDocumentStore to ElasticsearchDocumentStore for "production" use cases.
It runs fine on Colab, but when the users try to apply it in other environments, they encounter problems starting/connecting to Elasticsearch.
I list some problems I have encountered myself (on Ubuntu 22.04).
Without Docker
chown -R daemon:daemon elasticsearch-7.9.2: can fail if you don't prepend sudo
Elasticsearch version is not aligned with launch_es (currently 7.17.6)
sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch: fails if Java is not installed
(could not find java in bundled jdk at....)
Using Docker
launch_es(): fails if you are not a superuser;
fails in Windows (https://github.com/deepset-ai/haystack/issues/4949);
it always fails with the misleading message It is likely that there is already an existing Elasticsearch instance running.
(IMO, we should improve or remove this helper function...)
My personal ugly solution is running:
sudo docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1g -Xmx1g" docker.elastic.co/elasticsearch/elasticsearch:7.17.6
But we can't expect Haystack beginners to do this.
What we can do
This tutorial may be fine if limited to the Colab environment.
I would like to have a simple guide for users to run their Elasticsearch instance on Ubuntu, MacOS and Windows...
(there is something similar in the docs, but I would make it more detailed and prominent.)
If we produce a guide like this, we can simply link it in the tutorial.
Build a Scalable Question Answering System is an important tutorial, because it guides users to transition from
InMemoryDocumentStore
toElasticsearchDocumentStore
for "production" use cases.It runs fine on Colab, but when the users try to apply it in other environments, they encounter problems starting/connecting to Elasticsearch.
I list some problems I have encountered myself (on Ubuntu 22.04).
Without Docker
chown -R daemon:daemon elasticsearch-7.9.2
: can fail if you don't prependsudo
launch_es
(currently 7.17.6)sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch
: fails if Java is not installed (could not find java in bundled jdk at....
)Using Docker
launch_es()
: fails if you are not a superuser; fails in Windows (https://github.com/deepset-ai/haystack/issues/4949); it always fails with the misleading messageIt is likely that there is already an existing Elasticsearch instance running
. (IMO, we should improve or remove this helper function...)My personal ugly solution is running:
sudo docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1g -Xmx1g" docker.elastic.co/elasticsearch/elasticsearch:7.17.6
But we can't expect Haystack beginners to do this.What we can do
This tutorial may be fine if limited to the Colab environment. I would like to have a simple guide for users to run their Elasticsearch instance on Ubuntu, MacOS and Windows... (there is something similar in the docs, but I would make it more detailed and prominent.) If we produce a guide like this, we can simply link it in the tutorial.
(FYI @bilgeyucel @Timoeller @masci)