chaoss / grimoirelab-elk

GNU General Public License v3.0
59 stars 121 forks source link

[groupsio] Getting Client Error 429 using p2o.py but corresponding perceval call works #839

Closed teragopher closed 4 years ago

teragopher commented 4 years ago

perceval version number: 0.13.0

Hi,

When I run this command to enrich index

p2o.py --enrich --index [redacted] --index-enrich [redacted] -e [redacted] -g --scroll-size 1000 --db-host [redacted] --db-sortinghat [redacted] --db-user [redacted] --db-password [redacted] groupsio  onap+onap-discuss -e [redacted] -p [redacted]

I get this error

`2020-04-10 13:16:50,464 Impossible to download archives from https://groups.io/g/onap+onap-discuss. Error info:
2020-04-10 13:16:50,464 Error feeding raw from groupsio (https://groups.io/g/onap+onap-discuss): 429 Client Error: Too Many Requests for url: https://groups.io/api/v1/downloadarchives?group_id=19846&start_time=1970-01-01T00%3A00%3A00%2B00%3A00
Traceback (most recent call last):
  File "/repos/grimoirelab-elk/grimoire_elk/elk.py", line 228, in feed_backend
    ocean_backend.feed(**params)
  File "/repos/grimoirelab-elk/grimoire_elk/raw/elastic.py", line 234, in feed
    self.feed_items(items)
  File "/repos/grimoirelab-elk/grimoire_elk/raw/elastic.py", line 250, in feed_items
    for item in items:
  File "/repos/grimoirelab-perceval/perceval/backend.py", line 226, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/repos/grimoirelab-perceval/perceval/backends/core/groupsio.py", line 131, in fetch_items
    mailing_list.fetch(from_date)
  File "/repos/grimoirelab-perceval/perceval/backends/core/groupsio.py", line 229, in fetch
    success = self._download_archive(url, payload, filepath)
  File "/repos/grimoirelab-perceval/perceval/backends/core/groupsio.py", line 270, in _download_archive
    r.raise_for_status()
  File "/repos/grimoirelab-perceval/perceval/backends/core/groupsio.py", line 270, in _download_archive
    r.raise_for_status()
  File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://groups.io/api/v1/downloadarchives?group_id=19846&start_time=1970-01-01T00%3A00%3A00%2B00%3A00`

But when i run the corresponding perceval command from same host like this

perceval groupsio 'onap+onap-discuss' -e [redacted]  -p [redacted]

It works.. I don't get any 'Too Many Requests' Error

Any idea why this is happening?

Thanks, Mike.

valeriocos commented 4 years ago

Hi @teragopher!

The error should be similar to https://github.com/chaoss/grimoirelab-perceval/issues/650. Can you try to install ELK from source code? Another option is that you have tried to many time to download that archive, and now your requests are being throttled (see last bullet in Notes at https://groups.io/api#download-archives).

I see that in the readme of ELK there is no installation section. It would be great if you would like to submit a PR to add that section. This section would be similar to the one of perceval, but just with the source code and pip subsections. WDYT?

valeriocos commented 4 years ago

@teragopher let's close this issue, feel free to reopen it if needed. Thanks!

teragopher commented 4 years ago

HI @valeriocos

Thanks for this. Seems to work now. For missing installation readme, would be a pleasure. I will send in a PR shortly.

valeriocos commented 4 years ago

Great, thanks @teragopher for contributing!