Length of query potentially higher than 1024

TeMU-BSC / iberifier

2 stars 0 forks source link

Length of query potentially higher than 1024 #13

Closed Oliph closed 2 years ago

Oliph commented 2 years ago

[ ] The first query line seems to be useless
[ ] The total of characters will be over 1024 as it does not take into account the spaces added as well as the str ' -is:retweet'

https://github.com/TeMU-BSC/iberifier/blob/575189e808c6c3d3189227506e620d08e977101f/twitter/search_from_keys.py#L55-L64

cuquiwi commented 2 years ago

Here query makes reference to the query used to search for tweets, and not to the request. Since now we search by 'bigrams' that are a couple of words or NER it will potentially never go over 1024. Therefore this check is useless, it was developed before deciding to use 'bigrams' as keywords.

e.g. : key_list -> ['mascarilla', 'hipoxia'] query -> "mascarilla hipoxia -is:retweet" len(query) -> 30

Oliph commented 2 years ago

OK I thought that we were aggregating the bigrams into one query per claim. In that case, one query per bigram, it would probably be better to use the AND operator in that query, otherwise there is no much point to build one query per bigram if everything will be a OR at the end.

Oliph commented 2 years ago

According to the documentation (I hope I got the right one, not sure about it), just using space is equal to the "AND" operator (as long as there are not double-quoted): https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

We can create queries mixing OR and AND. Then it becomes possible to build one query per claim (but checking for the 1024 length). This will help to manage the different rate limits.