allenai / wimbd

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Apache License 2.0
172 stars 18 forks source link

Unable to use the elastic search index: AuthError #10

Closed vishaal27 closed 2 weeks ago

vishaal27 commented 5 months ago

Hey,

I got access to the access keys for the elastic search index and updated them in the es_config.yml file. Then, I try running this code:

import json
from wimbd.es import es_init
from elasticsearch import Elasticsearch

es = es_init()

query = {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "text": {
              "query": "hello",
              "slop": 0
            }
          }
        },
      ],
    "minimum_should_match": 1
    }
}

total_search_number = es.count(index="re_laion2b-en-1", query=query)
print(total_search_number)

However, I get this error when I run this:

elasticsearch.AuthorizationException: AuthorizationException(403, 'security_exception', 'action [indices:data/read/search] is unauthorized for API key id [uDeV9IsBTffG4Z11VjJJ] of user [3344647685] on indices [laion1b-nolang], this action is granted by the index privileges [read,all]')

I also noticed a similar issue in the previous issue: https://github.com/allenai/wimbd/issues/9, is it the same issue with the re_laion2b-en-1 index? I can run without issues on the re_oscar and openwebtext indices. Is it possible the make the laion-indices public access since I would like to use your tool for analysing LAION-2B-en statistics?

yanaiela commented 5 months ago

Hey,

Unfortunately due to the legal issues with LAION we cannot provide access to this dataset.