allenai / wimbd

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Apache License 2.0
172 stars 18 forks source link

Unable to run several function #16

Closed aflah02 closed 1 month ago

aflah02 commented 1 month ago

Hi I am running the following script -

from wimbd.es import es_init
from wimbd.es import count_documents_containing_phrases

es_dolma = es_init('es_config_dolma_1.yml')
es_others = es_init('es_config_7.yml')

from wimbd.es import get_indices

# This returns all indices, along with their total document counts.
print(get_indices())

# This also returns elasticsearch mapping information.
print(get_indices(return_mapping=True))

# Count the number of documents containing the term "legal".
print(count_documents_containing_phrases("docs_v1.5_2023-11-02", "legal", es = es_dolma))  # single term

I am facing the following issues -

elasticsearch.AuthorizationException: AuthorizationException(403, 'security_exception', 'action [cluster:monitor/state] is unauthorized for API key id [{KEY}] of user [{USER_ID}], this action is granted by the cluster privileges [read_ccr,transport_client,cross_cluster_replication,manage_ccr,monitor,manage,all]')
elasticsearch.AuthorizationException: AuthorizationException(403, 'security_exception', 'action [indices:data/read/search] is unauthorized for API key id [{KEY}] of user [{USER_ID}] on indices [docs_v1.5_2023-11-02], this action is granted by the index privileges [read,all]')

I am probably doing something wrong but I can't figure out what

yanaiela commented 1 month ago

Hey,

As mentioned here, the get_indices function doesn't work with the given access key, but you can find the relevant indices names in the same page.

Re the count_documents_containing_phrases function, this is strange, as this should work. Can you send me an email, and I'll send you a new key, perhaps the previous one I sent is no longer active.

aflah02 commented 1 month ago

Gotcha! Thanks for sharing this I'll also drop a mail