adobe / helix-cli

Command-line tools for developing with AEM
Apache License 2.0
49 stars 58 forks source link

[import] CDN protected sites cannot be proxied #2050

Open kptdobe opened 2 years ago

kptdobe commented 2 years ago

Run hlx import and try to import page https://www.globe.com.ph/business/enterprise.html. Nothing happens, the proxy returns 403.

Try to open proxy page in another browser tab: http://localhost:3001/business/enterprise.html?host=https%3A%2F%2Fwww.globe.com.ph and you get this:

image

The proxy request has the exact same headers than the browser request. Cloudflare puts a lot of effort from proxy / scripts / bots (well, no humans...) to not access the site. I do not think we can workaround this.

tripodsan commented 2 years ago

maybe it's possible to detect this, and then open the captcha.page, somehow steal the token and send it it along the requests...

trieloff commented 2 years ago
curl 'https://www.globe.com.ph/business/enterprise.html' \
-X 'GET' \
-H 'Cookie: __cf_bm=52rI9_zlhPg5F8ggmY2k74e2WngRqBcnkZxFJK9Szfs-1660555243-0-AVRprjhOVAUorYu8H231NlLOP0DQ37QUC7ttgc8ELRf2bM6KG57rR8FXwUYlwAC36PQUhfII0OGD6o5b+4+1Xo1NJ4TBfg64v8CNV/HPspLbIoFuFHmL4bTan8uzwdLODevMYw6NZc0AkASSqujJQLM8NaA2EHBZA1AA0VAFoVBr; cas_globe_previous_url=https://www.globe.com.ph/business/enterprise.html; policy=true; AWSELB=A1B125F1125C8DEEC3E5547E6F45EDCD90C6005B09A7E4ECA99D4520B2712C3EE6A9F70C5DB9AD2BC4E481D67EA0B261FCB3F41AC317CF068D3D6D7964D471101F690D5CA5; AWSELBCORS=A1B125F1125C8DEEC3E5547E6F45EDCD90C6005B09A7E4ECA99D4520B2712C3EE6A9F70C5DB9AD2BC4E481D67EA0B261FCB3F41AC317CF068D3D6D7964D471101F690D5CA5' \
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' \
-H 'Host: www.globe.com.ph' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6 Safari/605.1.15' \
-H 'Accept-Language: en-GB,en;q=0.9' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Connection: keep-alive' --output - | gunzip

The __cf_bm Cookie is essential. Maybe adding this as a CLI option could work (you'd still need to go into dev tools to steal the cookie)

ghost commented 8 months ago

Please can someone help me understand what this is all about