allenai / wimbd

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Apache License 2.0
164 stars 17 forks source link

error when search on "re_pile" #9

Closed WilliamsToTo closed 3 days ago

WilliamsToTo commented 5 months ago

Below is my code.

import json
from wimbd.es import es_init, get_indices, count_documents_containing_phrases, get_documents_containing_phrases
from elasticsearch import Elasticsearch

es = es_init(config="es_config_4.yml", timeout=600)
#indices = [name for name in es.indices.get(index="*").keys() if not name.startswith(".")]
#print(get_indices(return_mapping=True))
query = {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "text": {
              "query": "conceptnet",
              "slop": 0
            }
          }
        },
      ],
    "minimum_should_match": 1
    }
}
highlight = {
    "fields": {
      "text": {
        "type": "plain",
        "fragment_size": 128,  # Approximation based on average word length
        "number_of_fragments": 5,
      }
    }
}
# for doc in get_documents_containing_phrases("re_oscar", phrases=["cancer causes smoking"], all_phrases=True, num_documents=2):
#     print(doc)

total_search_number = es.count(index="re_pile", query=query) # docs_v1.5_2023-11-02, re_pile
print(total_search_number)
result = es.search(index="re_pile", query=query, highlight=highlight, size=20)
json.dump(result.body, open('search_result.json', 'w'), indent=4)

The error is:

Traceback (most recent call last):
  File "/home/taof/wimbd/search_test.py", line 36, in <module>
    total_search_number = es.count(index="re_pile", query=query) # docs_v1.5_2023-11-02, re_pile
  File "/home/taof/wimbd/env/lib/python3.9/site-packages/elasticsearch/_sync/client/utils.py", line 414, in wrapped
    return api(*args, **kwargs)
  File "/home/taof/wimbd/env/lib/python3.9/site-packages/elasticsearch/_sync/client/__init__.py", line 925, in count                                                                                                                   
    return self.perform_request(  # type: ignore[return-value]
  File "/home/taof/wimbd/env/lib/python3.9/site-packages/elasticsearch/_sync/client/_base.py", line 320, in perform_request                                                                                                            
    raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.AuthorizationException: AuthorizationException(403, 'security_exception', 'action [indices:data/read/search] is unauthorized for API key id [uDeV9IsBTffG4Z11VjJJ] of user [3344647685] on indices [re_pile], this action
is granted by the index privileges [read,all]')

But such error doesn't happen when I do the same thing on "re_oscar".

yanaiela commented 5 months ago

Hey,

Unfortunately we cannot provide access to the Pile.