commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Structure of ES clusters #43

Closed IvRRimum closed 8 years ago

IvRRimum commented 8 years ago

Hello,

So in your architecture image you tell us that there are 2 ES clusters: https://about.commonsearch.org/developer/architecture

But when you view Operations documentation: You tell us that there is 1 cluster and 3 nodes, so witch one is it ?

Also i successfully executed ES cloudformation create and it created 1 instances, but when i open the public link, there is no ES installed ?

I am really confused, can someone explain ?

sylvinus commented 8 years ago

Hi @IvRRimum,

We do use 2 separate ES indices as drawn in the schema, but currently we make them share the same ES instance, for simplicity. It will stay this way in local development, but probably not in production.

If you wanted, you could already spin 2 different ES clusters and point to them separately in config.py

IvRRimum commented 8 years ago

How can i create those 2 ES instances ?

Going in AWS -> Elasticsearch -> create new ?

And then point to the IP and it will work ? Thats all that is needed ?

Also i dont see any examples of cosr-config.json ( https://github.com/commonsearch/cosr-back/blob/master/cosrlib/config.py files tells me that.

sylvinus commented 8 years ago

Yes, any ES instance will work. cosr-ops contains tools to spin up your own cluster but you could use the AWS hosted instances, or even https://www.elastic.co/cloud ; Let me reiterate that you can reuse the same one for simplicity.

cosr-config.json is just a JSON file with the same keys as the config.py file (which only contains the default values).

IvRRimum commented 8 years ago

Okey, i created 2 clusters in Elasticsearch services section in AWS.

How do i deploy test data in there ?

sylvinus commented 8 years ago

Depends which part of common search you want to work on?

For cosr-back, you should do make import_local_data to get started and then make reindex1 For cosr-front, you can use the devindex in the local ES cluster started by Docker, see here: https://github.com/commonsearch/cosr-front/blob/master/INSTALL.md

IvRRimum commented 8 years ago

I will do both! I will leave an update if i encounter error i sucessfully set-everything up.

After i set-everything up, i think i will create guide for complete deployment of commonsearch and make pull request on the bugs i descovered.

Also thanks for response, means a lot!

sylvinus commented 8 years ago

That's great, thanks!

IvRRimum commented 8 years ago

Hello,

I created 2 Es clusters, i made the make import_local_data, but when i run make reindex1, i get the following error:

(venv) root@ip-172-30-0-177 back]$ make reindex1
./scripts/elasticsearch_reset.py --delete
No handlers could be found for logger "elasticsearch"
^@Traceback (most recent call last):
  File "./scripts/elasticsearch_reset.py", line 19, in <module>
    indexer.empty()
  File "/cosr/back/cosrlib/indexer.py", line 42, in empty
    self.es_docs.create(empty=True)
  File "/cosr/back/cosrlib/es.py", line 103, in create
    self.empty()
  File "/cosr/back/cosrlib/es.py", line 57, in empty
    if self.indices().exists(index=self.index_name):
  File "/cosr/back/venv/lib/python2.6/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/cosr/back/venv/lib/python2.6/site-packages/elasticsearch/client/indices.py", line 226, in exists
    params=params)
  File "/cosr/back/venv/lib/python2.6/site-packages/elasticsearch/transport.py", line 329, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/cosr/back/venv/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py", line 102, in perform_request
    raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError((<urllib3.connection.HTTPConnection object at 0x2742d10>, 'Connection to 52.3.91.82 timed out. (connect timeout=10)')) caused by: ConnectTimeoutError((<urllib3.connection.HTTPConnection object at 0x2742d10>, 'Connection to 52.3.91.82 timed out. (connect timeout=10)'))
make: *** [reindex1] Error 1

My config:

_defaults = {

    # HTTP URL of both ElasticSearch servers
    "ELASTICSEARCHTEXT": "http://52.23.11.145",
    "ELASTICSEARCHDOCS": "http://52.3.91.82",

    # Host:port of the URLserver instance, or "local" for direct import on the same node
    "URLSERVER": "local",  # "192.168.99.100:9702"

    # Host:port of the Explainer instance
    "EXPLAINER": "0.0.0.0:9703",  # "127.0.0.1:9703"

    # Environment type: prod, staging, local, ci, ...
    "ENV": "local",

    # Should we use files in tests/testdata/ as datasources? ("0" or "1")
    "TESTDATA": "0",

    # Path to the parent directory of cosrlib
    "PATH_BACK": os.path.dirname(os.path.dirname(__file__)),

    # Path to the local-data directory
    "PATH_LOCALDATA": os.path.join(os.path.dirname(os.path.dirname(__file__)), "local-data")
}

Why cant it connect ? Maybe its the difference in ES versions ?

IvRRimum commented 8 years ago

And when i open kibana, it asks me to: Please specify a default index pattern