datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

HTTP Error 403: Forbidden for resources #118

Closed Mikanebu closed 6 years ago

Mikanebu commented 6 years ago

Issue reported here datahq/datahub-qa#81

We are getting 403 forbidden from CloudFlare if you try and open url with urllib.

from urllib.request import urlopen
urlopen('https://pkgstore.datahub.io/core/country-list/data_csv/data/d7c9d7cfb42cb69f4422dec222dbbaa8/data_csv.csv')

Traceback (most recent call last):
  ...
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Acceptance criteria

Tasks

Analysis

This is happening cause Cloudflare is Evaluating HTTP headers from visitors browser for threats and If a threat is found a block page is delivered.

Solutuin/workeround :

Turn off Browser Integrity Check

zelima commented 6 years ago

@rufuspollock @akariv This is happening cause Cloudflare does not really like urllib and blocks as bot.

Browser Integrity Check:

Evaluate HTTP headers from your visitors browser for threats. If a threat is found a block page will be delivered.

Solutuin/workeround would be to turn off Browser Integrity Check in cloudflare (That I already did, but can turn back on)

What do you think?

akariv commented 6 years ago

I think we should turn it off for resources and APIs, and leave it on for everything else.

Is that possible @zelima ?

zelima commented 6 years ago

@akariv yes, it is available - using "page rules", but unfortunately free page rules are limited. We only have 1 "page rule" remaining (out of 3). Additional 5 "page rule" costs 5$ per month

image

rufuspollock commented 6 years ago

My 2c is just to turn off browser integrity check entirely for now.

zelima commented 6 years ago

FIXED. We turned off browser integrity check entirely