benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
681 stars 42 forks source link

x:request({"proxy":[...]}) doesn't work #57

Closed Reino17 closed 3 years ago

Reino17 commented 4 years ago

--proxy:

xidel-0.9.9-7433-8b7ba70-openssl-win32.exe -s --proxy=1.255.48.197:8080 --user-agent=curl https://ipinfo.io -e "$json"
{
  "ip": "1.255.48.197",
  "city": "Bangkok",
  "region": "Bangkok",
  "country": "TH",
  "loc": "13.7540,100.5014",
  "org": "AS23576 NBP",
  "postal": "10100",
  "timezone": "Asia/Bangkok",
  "readme": "https://ipinfo.io/missingauth"
}

xidel-0.9.9-7442-7d05304-openssl-win32.exe -s --proxy=1.255.48.197:8080 --user-agent=curl https://ipinfo.io -e "$json"
Error:
Internet Error: -4
when talking to: https://ipinfo.io/

xidel-0.9.9-7622-1650cd0-openssl-win32.exe -s --proxy=1.255.48.197:8080 --user-agent=curl https://ipinfo.io -e "$json"
Error:
Internet Error: -4 Connection failed. Some possible causes: Failed DNS lookup, failed to load OpenSSL, failed proxy, server does not exists or has no open port.
when talking to: https://ipinfo.io/

Something broke between revision 7433 and 7442.

x:request({"proxy":[...]}):

xidel-0.9.8-openssl.exe -s --user-agent=curl -e "x:request({'proxy':'1.255.48.197:8080','url':'https://ipinfo.io'})/json"
{
  [...]
  "country": "NL",
  [...]
}

xidel-0.9.9-7622-1650cd0-openssl-win32.exe -s --user-agent=curl -e "x:request({'proxy':'1.255.48.197:8080','url':'https://ipinfo.io'})/json"
{
  [...]
  "country": "NL",
  [...]
}

It returns my own ip-address, so it looks as if the "proxy"-key is just ignored.

https://github.com/benibela/xidel/issues/55#issuecomment-723493828:

The proxy and user agent are set once when xidel is started, and afterwards nothing can change them

Why? With a scenario like...

xidel -se "
  let $src1:=x:request({'proxy':[...],'url':<url1>}),
      $src2:=x:request({'url':<url2>}),
      $src3:=x:request({'proxy':[...],'url':<url3>})
  return
  [...]
"

...that's a real limitation, if you ask me.

benibela commented 3 years ago

implemented in https://github.com/benibela/internettools/commit/8b3bbdee34b970fcab0e7fb35935bf800fcb285e https://github.com/benibela/internettools/commit/445d3cff98f3eb2de862be57ac0e507e1f50c3c5 https://github.com/benibela/internettools/commit/8ae48409c2089a0da9e07af2adf6d5a7687f0358 https://github.com/benibela/xidel/commit/a2b6eeabbe6dcad5824f7e0b72f6fb154b4508a0

Reino17 commented 3 years ago

I can't seem to find a working proxy for testing, so I can't confirm the fix for now. Guess we'll have to wait and see if anyone else who encounters problems using a proxy comes forth, or not.

benibela commented 3 years ago

but you wrote the x:request proxy finding script

Reino17 commented 3 years ago

Haha, forgot about that.

xidel -s --allow-repetitions "https://hidemy.name/en/proxy-list/?type=h" -e '
  for $x in //tbody/tr[position() lt 11]/concat(td[1],":",td[2]) return
  x:request({
    "proxy":$x,
    "method":"HEAD",
    "error-handling":"xxx=accept",
    "url":"https://github.com"
  })/concat($x," - ",headers[1])
'

I'm getting either nothing at all (command/query runs forever), or...

Error:
Internet Error: -4 Connection failed. Some possible causes: Failed DNS lookup, failed to load OpenSSL, failed proxy, server does not exists, has no open port or uses an unknown https certificate.
when talking to: https://github.com/

...or...

Error:
Internet Error: -3
HTTPS connection failed after connecting to server. Some possible causes: handshake failure, mismatched HTTPS version/ciphers, invalid certificate
OpenSSL-Error: error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version
OpenSSL information: CA file: cacert.pem , CA dir:  , TLSv1,
when talking to: https://github.com/
benibela commented 3 years ago

So it did not work.

Synapse kept using the old proxy, even if called with a new proxy. But now: https://github.com/benibela/internettools/commit/074a6a873f5172a3de87964f160cb49c4bced831

command/query runs forever

It can take a few minutes. Without -s it is easier to watch

Internet Error: -4 Connection failed. Some possible causes: Failed DNS lookup, failed to load OpenSSL, failed proxy, server does not exists, has no open port or uses an unknown https certificate.

"error-handling":"xxx=accept,xx=accept,x=accept", prevents that

Reino17 commented 3 years ago

Could it be they're blocking xidel? The first command/query I tested about half an hour ago...

$ xidel -s --allow-repetitions "https://hidemy.name/en/proxy-list/?type=h" -e '
  for $x in //tbody/tr[position() lt 11]/concat(td[1],":",td[2]) return
  x:request({
    "proxy":$x,
    "method":"HEAD",
    "error-handling":"xxx=accept,xx=accept,x=accept",
    "url":"https://github.com"
  })/concat($x," - ",headers[1])
'
189.39.127.118:8080 - HTTP/1.1 200 OK
102.22.193.41:55443 - HTTP/1.1 200 OK
103.155.198.113:8181 - HTTP/1.1 200 OK
131.221.98.98:55443 - HTTP/1.1 200 OK
177.71.77.202:20183 - HTTP/1.1 200 OK
144.91.81.255:3128 - HTTP/1.1 200 OK
188.166.162.1:3128 - HTTP/1.1 200 OK
178.63.17.151:3128 - HTTP/1.1 200 OK
116.202.228.162:3128 - HTTP/1.1 200 OK
207.180.193.24:3128 - HTTP/1.1 200 OK

...was successful, as you can see. But after that the same query didn't return anything anymore.

I thought restricting the input to some Dutch proxies would help, but...

$ xidel -s --allow-repetitions "https://hidemy.name/en/proxy-list/?country=NL&type=h#list" -e '
  for $x in //tbody/tr[position() lt 6]/concat(td[1],":",td[2]) return
  x:request({
    "proxy":$x,
    "method":"HEAD",
    "error-handling":"xxx=accept,xx=accept,x=accept",
    "url":"https://github.com"
  })/concat($x," - ",headers[1])
'
176.126.207.17:80 -
176.126.206.40:80 -
45.131.5.133:80 -
93.114.65.150:80 -
185.171.230.195:80 -

$ xidel -s "https://hidemy.name/en/proxy-list/?country=NL&type=h#list" -e '
  for $x in //tbody/tr[position() lt 6]/concat(td[1],":",td[2]) return
  system(
    x"xidel --proxy ""{$x}"" --error-handling=""xxx=accept,xx=accept,x=accept"" --method=HEAD https://github.com -e '\''concat({$x},"" - "",$headers[1])'\''"
  )
'
<returns nothing>

For the record, this is xidel-0.9.9-6611-2dbafd7-openssl-win32.exe.

benibela commented 3 years ago

Could it be they're blocking xidel? The first command/query I tested about half an hour ago...

The site just looks very unreliable