beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.49k stars 177 forks source link

Reproduce Signal-1M #147

Closed Dundalia closed 1 year ago

Dundalia commented 1 year ago

I have seen that to reproduce the Signal-1M dataset I should manually scrape the tweets with the scrape_tweets.py script you provided.

After inserting my consumer_key and consumer_secret I am facing the following error:

Forbidden: 403 Forbidden 453 - You currently have access to a subset of Twitter API v2 endpoints and limited v1.1 endpoints (e.g. media post, oauth) only. If you need access to this endpoint, you may need a different access level. You can learn more here: https://developer.twitter.com/en/portal/product

Can you give me some indications? Or the only way is to pay for the Basic subscription in order to get the access level?

thakur-nandan commented 1 year ago

Hi @Freddavide, you can download directly the SIGNAL-1M preprocessed dataset here: https://drive.google.com/drive/folders/1CgDO-KmQQMpGEGeD3R20ZgTTM008xix9?usp=sharing.

Please make sure you have the necessary licenses required.

Thanks, Nandan

Dundalia commented 1 year ago

Wow thanks!