benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
674 stars 42 forks source link

`Internet Error: -4` by using HTTPS over proxy #55

Closed mr-july closed 3 years ago

mr-july commented 3 years ago

Without success I try to use the program behind the proxy.

For example the command

xidel --proxy $https_proxy https://stackoverflow.com -e '//h2/normalize-whitespace()'

produces the following output:

**** Retrieving (GET): https://stackoverflow.com ****
Error:
Internet Error: -4 
when talking to: https://stackoverflow.com/

My OS is the Arch Linux with the kernel 5.8.10-arch1-1

I tried to compile the program myself, but encountered problems with missed units: strutils, base64, etc. How can I fix this?

Is it possible that the program expects a non-existent version of OpenSSL or some other library?

mr-july commented 3 years ago

The problem is the SSL support. HTTP over proxy works like a charm. Is it possible, that the synapse library is a little outdated? How can it be fixed?

benibela commented 3 years ago

I tried to compile the program myself, but encountered problems with missed units: strutils, base64, etc. How can I fix this?

Those units are part of FreePascal's own libraries. Do you have FreePascal installed, all of it?

Is it possible, that the |synapse| library is a little outdated?

Synapse has not been updated for 8 years...

The default setting is to load libssl.so and libcrypto.so

I have patched it for newer OpenSSL versions, so it tries to load different file names.  See: https://sourceforge.net/p/synalist/feature-requests/23/

How are your .sos called?

mr-july commented 3 years ago

Synapse has not been updated for 8 years... The default setting is to load libssl.so and libcrypto.so I have patched it for newer OpenSSL versions, so it tries to load different file names.  See: https://sourceforge.net/p/synalist/feature-requests/23/ How are your .sos called?

I have libssl.so and libcrypto.so under /usr/lib/, they are symlinks to 1.1

ls -al /usr/lib/libcrypto.so /usr/lib/libssl.so
lrwxrwxrwx 1 root root 16 Sep 22 16:59 /usr/lib/libcrypto.so -> libcrypto.so.1.1
lrwxrwxrwx 1 root root 13 Sep 22 16:59 /usr/lib/libssl.so -> libssl.so.1.1

But once again, HTTPS works just fine without proxy, so libraries are probably successfully loaded.

But, when I'm trying to use the program with HTTPS URLs behind a proxy (with --proxy parameter), I get the Internet Error: -4.

So I think, the proxy handling code in conjunction with SSL is probably broken. Moreover, the problem is not specific for Arch Linux. I have experimented on my RasPi with the xidel version 0.9.9 (20200930.7580.9cac3c17e8ed)...

I've got the list of working proxies with the following command:

for x in `xidel https://hidemy.name/en/proxy-list/\?type\=h -e '//tbody//tr/string-join((td[1], td[2]), ":")'`; do curl --connect-timeout 10 --proxy $x --head -s https://github.com/ > /dev/null && echo "WORKING: $x" ; done

but none of them works with xidel

for x in `xidel https://hidemy.name/en/proxy-list/\?type\=h -e '//tbody//tr/string-join((td[1], td[2]), ":")'`; do curl --connect-timeout 10 --proxy $x --head -s https://github.com/ > /dev/null && (xidel --proxy $x -e '//title' https://github.com && echo "$x is working" || echo "$x BROKEN"); done
Reino17 commented 3 years ago

For what it's worth, the commandline-option doesn't work for me either:

$ xidel -s --proxy "167.71.5.83:3128" --method=HEAD "https://github.com" -e '$headers[1]'
Error:
Internet Error: -4
when talking to: https://github.com/

x:request() on the other hand does work:

$ xidel -se 'x:request({"proxy":"167.71.5.83:3128","method":"HEAD","url":"https://github.com"})/headers[1]'
HTTP/1.1 200 OK

@benibela is it normal that a query such as this one crashes?

$ xidel -s "https://hidemy.name/en/proxy-list/?type=h" -e '
  for $x in //tbody/tr/concat(td[1],":",td[2]) return
  x:request({
    "proxy":$x,
    "method":"HEAD",
    "error-handling":"xxx=accept",
    "url":"https://github.com"
  })/concat($x," - ",headers[1])
'
An unhandled exception occurred at $004405D6:
EAccessViolation: Access violation
  $004405D6
  $0049E814
  $004C7B05
  $004A1095
  $0049B21E
  $0049D5BE
  $0049B529
  $00499C1B
  $0048DEC4
  $0048DD80
  $00436E47
  $00438ACF
  $00436276
  $00435D9F
  $0043F9AD
mr-july commented 3 years ago

x:request() on the other hand does work:

$ xidel -se 'x:request({"proxy":"167.71.5.83:3128","method":"HEAD","url":"https://github.com"})/headers[1]'
HTTP/1.1 200 OK

if I'm behind the firewall, then it doesn't work, I think, the "proxy" part will be just ignored.

benibela commented 3 years ago

So I think, the proxy handling code in conjunction with SSL is probably broken.

Looks like it. Here is a fix: https://sourceforge.net/p/synalist/bugs/48/

if I'm behind the firewall, then it doesn't work, I think, the "proxy" part will be just ignored.

Looks like it. It is merging the arguments with the command line arguments, which is very complicated. Probably too complicated. Some options can set for each request and some options can be set only once globally.

@benibela is it normal that a query such as this one crashes?

Crashes are never normal. Here, it happens because any url can only be retrieved once unless --allow-repetitions is set. It expects a request, but there is no request to be made again

Reino17 commented 3 years ago

--allow-repetitions

$ xidel -s --allow-repetitions "https://hidemy.name/en/proxy-list/?type=h" -e '
  for $x in //tbody/tr[position() lt 4]/concat(td[1],":",td[2]) return
  x:request({
    "proxy":$x,
    "method":"HEAD",
    "error-handling":"xxx=accept",
    "url":"https://github.com"
  })/concat($x," - ",headers[1])
'
139.59.1.14:8080 - HTTP/1.1 200 OK
59.120.117.244:80 - HTTP/1.1 200 OK
128.199.202.122:8080 - HTTP/1.1 200 OK

Confirmed. It can't be used as x:request({[...],"allow-repetitions":true()}), or can it?

benibela commented 3 years ago

|x:request({[...],"allow-repetitions":true()})|, or can it?

It cannot

mr-july commented 3 years ago

Looks like it. Here is a fix: https://sourceforge.net/p/synalist/bugs/48/

I confirm. The patch is functional. Thank you all!

@benibela could you please integrate this patch into the next build? After that, the ticket may be closed.

Reino17 commented 3 years ago
$ xidel -s --proxy "167.71.5.83:3128" --method=HEAD "https://github.com" -e '$headers[1]'
Error:
Internet Error: -4
when talking to: https://github.com/

I don't know about mr-july, but for me --proxy still doesn't work.

benibela commented 3 years ago

Do you have the newest version? I have changed the error message

And you need new proxies, these free proxies only work for a few hours and then disappear

Reino17 commented 3 years ago

I used the same binary I compiled yesterday for the file:move()-issue, which is from d19610d, and therefore has your fix c7dfc53.

The proxy still works, as you can see on https://hidemy.name/en/proxy-list/?type=h.

xidel-0.9.9-6194-d19610d-win32.exe -s --proxy "167.71.5.83:3128" --method=HEAD "https://github.com" -e "$headers[1]"
Error:
Internet/HTTP Error: 400
when talking to: https://github.com/

xidel-0.9.9-6194-d19610d-openssl-win32.exe -s --proxy "167.71.5.83:3128" --method=HEAD "https://github.com" -e "$headers[1]"
Error:
Internet Error: -4 Connection failed. Some possible causes: Failed DNS lookup, failed to load OpenSSL, failed proxy, server does not exists or has no open port.
when talking to: https://github.com/
benibela commented 3 years ago

The proxy still works,

It does not work for me. Most of the list do not, with any programs.

This one works 1.255.48.197:8080

Reino17 commented 3 years ago

I think, the "proxy" part will be just ignored.

I think mr-july is right. I thought the proxy worked because x:request({"proxy":"167.71.5.83:3128",[...]}) returned HTTP/1.1 200 OK, but I think that's just because {"url":"https://github.com"} was accessed directly.

This one works 1.255.48.197:8080

Not for me.

mr-july commented 3 years ago

I've just tested the version xidel-0.9.9.20201025.7622.1650cd000ad2.linux64.tar.gz and it works with my corporate proxy. So for me the main proxy problem is solved. I've tested it as the following:

xidel --proxy $proxy https://stackoverflow.com -e "//title"
**** Retrieving (GET): https://stackoverflow.com ****
**** Processing: https://stackoverflow.com/ ****
Stack Overflow - Where Developers Learn, Share, & Build Careers

On the other hand the problem with x:request is still there:

xidel -se 'x:request({"proxy":"'$proxy'","method":"HEAD","url":"https://stackoverflow.com"})/headers[1]'
Error:
Internet Error: -4 Connection failed. Some possible causes: Failed DNS lookup, failed to load OpenSSL, failed proxy, server does not exists or has no open port.
when talking to: https://stackoverflow.com/
Reino17 commented 3 years ago

@benibela Could you please re-open the issue, because obviously it's not fixed yet at all.

@mr-july Can you confirm that xidel -s --proxy $proxy --user-agent curl https://ipinfo.io -e '$json' returns the IP-address of your proxy?

mr-july commented 3 years ago

Can you confirm that xidel -s --proxy $proxy --user-agent curl https://ipinfo.io -e '$json' returns the IP-address of your proxy?

yes

benibela commented 3 years ago

I have tried it on Windows, and now it does not work indeed.

But there it is all the opposite. The old version without the fix works

On the other hand the problem with x:request is still there:

I have not changed anything with x:request

Reino17 commented 3 years ago

The old version without the fix works

xidel-0.9.8-openssl.exe -s --proxy "1.255.48.197:8080" --user-agent curl https://ipinfo.io -e "$json,$headers[3]"
{
  "ip": "1.255.48.197",
  "city": "Phra Pradaeng",
  "region": "Samut Prakan",
  "country": "TH",
  "loc": "13.6585,100.5336",
  "org": "AS23576 NBP",
  "postal": "10130",
  "timezone": "Asia/Bangkok",
  "readme": "https://ipinfo.io/missingauth"
}
Content-Type: application/json; charset=utf-8

Confirmed. Hmm...
Within x:request() on the other hand...

xidel-0.9.8-openssl.exe -s --user-agent curl -e "x:request({'proxy':'1.255.48.197:8080','url':'https://ipinfo.io'})/json"

...proxy doesn't work, because it returns my own IP-address. The same goes for xidel-0.9.9-7622-1650cd0.

[slightly off-topic]
user-agent within x:request() doesn't appear to work.

xidel-0.9.9-7622-1650cd0-openssl-win32.exe -se "x:request({'proxy':'1.255.48.197:8080','user-agent':'curl','method':'HEAD','url':'https://ipinfo.io'})/headers[3]"
Content-Type: text/html; charset=utf-8

The user-agent is needed, or https://ipinfo.io just returns the html-source.
[/slightly off-topic]

Reino17 commented 3 years ago

Gentle *ping* for x:request({"user-agent":[...]}), and for --proxy / x:request({"proxy":[...]}).

benibela commented 3 years ago

Gentle ping for x:request({"user-agent":[...]}), and for --proxy / x:request({"proxy":[...]}).

perhaps you should open a new issue for that

But the issue is not with the request function.

The proxy and user agent are set once when xidel is started, and afterwards nothing can change them

ralyodio commented 3 years ago

this still doesn't work. i'm usign xidel 0.9.8

$ xidel -s --proxy "http://dmdgluqz-rotate:ih4pcmx2wxpq@p.webshare.io:80/" https://www.cnbc.com/quotes/GME -e '//div[@class="QuoteStrip-dataContainer"]'

ralyodio commented 3 years ago

This works:

$ xidel -s --header "proxy: dmdgluqz-rotate:ih4pcmx2wxpq@p.webshare.io" https://www.cnbc.com/quotes/GME -e '//div[@class="QuoteStrip-dataContainer"]'

benibela commented 3 years ago

this still doesn't work. i'm usign xidel 0.9.8

It does not work in 0.9.8 (on Linux)

I have fixed it for 0.9.9

$ xidel -s --header "proxy: dmdgluqz-rotate:ih4pcmx2wxpq@p.webshare.io" https://www.cnbc.com/quotes/GME -e '//div[@class="QuoteStrip-dataContainer"]'

That does not use the proxy

ralyodio commented 3 years ago

still doesn't work:

Not working:

$ xidel --proxy http://dmdgluqz-rotate:ih4pcmx2wxpq@p.webshare.io:80/ https://ipinfo.io/ip -e '.'

Working: $ curl --proxy "http://dmdgluqz-rotate:ih4pcmx2wxpq@p.webshare.io:80/" https://ipinfo.io/ip

$ xidel --version                                                                                 
Xidel 0.9.9
ralyodio commented 2 years ago

I have same problem still. ALso using webshare. proxy works fine in curl but not xidel.

benibela commented 2 years ago

but it works for me

what are your xidel/openssl/OS versions?