IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 488 forks source link

Investigate adding Apache-level mechanism for rejecting aggressive robot crawling #9359

Open landreev opened 1 year ago

landreev commented 1 year ago

This is a narrow case of the overall "rate limiting" umbrella issue. This would not attempt to throttle the overall traffic to the site, or to regulate the rate of requests from "normal" users (that would be better done within the application. This mechanism would be the first line of defense, for detecting obvious bot/scripted or otherwise automated crawling - for ex., repeated calls coming from the same ip plowing through the collection page facets without pausing between calls - before it even gets to the application.

This would be doing essentially what we periodically do with custom command line scripts in our production. But third party tools should be readily available for addressing this common problem.

cmbz commented 7 months ago

2024/03/14

landreev commented 7 months ago

@cmbz I meant more along the lines of "waiting until we deploy 6.2 - that will include Steven's application-side rate limiting solution - and experiment with it to see if that addresses the problem at hand, thus making an Apache-level solution unnecessary".

cmbz commented 6 months ago

2024/03/27

cmbz commented 1 month ago

2024/08/15