benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
681 stars 42 forks source link

[Feature request] Need Accept-Encoding (gzip) header support #49

Closed MegaCron closed 3 years ago

MegaCron commented 4 years ago

Hi.

xidel does not receive data with the Accept-Encoding (gzip) header. -H 'Accept-Encoding: gzip, deflate'

Can you add support for the Accept-Encoding (gzip) header ?

benibela commented 4 years ago

Someone has sent me a patch for that 0003-Adding-gzip-decoding.patch.txt

But I have not tried it. And it decompresses the data after downloading all of it, it should be possible to do it during the download on individual blocks.

MegaCron commented 4 years ago

Someone has sent me a patch for that

I do not know how to use this. I do not understand the source code. Will this patch be in the new xidel builds ?

benibela commented 4 years ago

I do not know how to use this. I do not understand the source code.

it needs to be merge with internetaccess.pas (from internettools repository)

Will this patch be in the new xidel builds ?

Eventually. Right now I am prioritizing implementing XQuery 3.1

Reino17 commented 4 years ago

xidel does not receive data with the Accept-Encoding (gzip) header. -H 'Accept-Encoding: gzip, deflate'

@MegaCron I'm curious, could you give an example url that doesn't return data without such a header?

MegaCron commented 4 years ago

As benibela said:

it needs to be merge with internetaccess.pas (from internettools repository)

I'm a stupid newbie :( I do not have the skills to use this :(

Eventually. Right now I am prioritizing implementing XQuery 3.1

ОК :\

As Reino17 said:

@MegaCron I'm curious, could you give an example url that doesn't return data without such a header?

There was some misunderstanding :(

If there is no Content Encoding (gzip) header, data is returned.

xidel -se '//title' 'http://www.unit-conversion.info' Unit conversion

With a Content Encoding (gzip) header, data is NOT returned.

xidel -H 'Accept-Encoding: gzip, deflate' -se '//title' 'http://www.unit-conversion.info'

Reino17 commented 4 years ago

With a Content Encoding (gzip) header, data is NOT returned.

I can confirm that with 0003-Adding-gzip-decoding.patch.txt data IS returned for this url (and header).
The Xidel binary I compiled is for Windows. If it's a Unix binary you need, then I can't help you with that.

MegaCron commented 4 years ago

As Reino17 said:

If it's a Unix binary you need, then I can't help you with that.

Yes, the unix binary is used - linux x32 or armv7/8 (android)

jiangwu007 commented 3 years ago

With a Content Encoding (gzip) header, data is NOT returned.

I can confirm that with 0003-Adding-gzip-decoding.patch.txt data IS returned for this url (and header). The Xidel binary I compiled is for Windows. If it's a Unix binary you need, then I can't help you with that.

May I have one, please

Reino17 commented 3 years ago

You can try xidel-0.9.9-6380-dc34769 here.

Reino17 commented 3 years ago

The StackExchange API is another url that doesn't work (with or without the patch):

$ xidel -s "https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow" -e '$json'
Error:
err:FOJS0001: Failed to parse JSON: Invalid character at line 0, pos 0: '▼' at  (tkEOF) in ▼?

Only curl works at the moment:

$ curl -s --compressed "https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow" | xidel - -se '$json'
gnfalex commented 3 years ago

Greetings. api.stackexchange.com returns headers name in lower case ('content-encoding' instead of 'Content-Encoding'). But i'd used case-sensitive "pos" So patch need to replace "pos('Content-Encoding',FLastHTTPHeaders[i])" to "pos('content-encoding', AnsiLowerCase(FLastHTTPHeaders[i]))" or something like. PS. I apologize for the long absence of an answer. Unfortunately, I lost the opportunity to work with this... Best regards.

Reino17 commented 3 years ago

So patch need to replace "pos('Content-Encoding',FLastHTTPHeaders[i])" to "pos('content-encoding',AnsiLowerCase(FLastHTTPHeaders[i]))" or something like.

That did the trick. Thank you!

PS. I apologize for the long absence of an answer. Unfortunately, I lost the opportunity to work with this...

An answer to what? This is your first post in this thread.
Or... are you the one who initially sent this patch to Benito? [edit] Silly me. Your name is in the original patch. [/edit]

jiangwu007 commented 3 years ago

Thank you very much. I just saw your message.

发送自 Windows 10 版邮件https://go.microsoft.com/fwlink/?LinkId=550986应用

发件人: Reino @.> 发送时间: 2021年4月16日 5:14 收件人: @.> 抄送: @.>; @.> 主题: Re: [benibela/xidel] [Feature request] Need Accept-Encoding (gzip) header support (#49)

So patch need to replace "pos('Content-Encoding',FLastHTTPHeaders[i])" to "pos('content-encoding',AnsiLowerCase(FLastHTTPHeaders[i]))" or something like.

That did the trick. Thank you!

PS. I apologize for the long absence of an answer. Unfortunately, I lost the opportunity to work with this...

An answer to what? This is your first post in this thread. Or... are you the one who initially sent this patch to Benito?

― You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/benibela/xidel/issues/49#issuecomment-820735437, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFHVM6THPJMZPEW2KJ7P43DTI5JKNANCNFSM4LPARRCQ.

benibela commented 3 years ago

implemented in

https://github.com/benibela/internettools/commit/eb7f21f0d1652bd39b1450e27da4772de27a8dc2

https://github.com/benibela/xidel/commit/9e556dc78a991dd7a4e8eb018d4dd6b4de561e2a

https://github.com/benibela/xidel/commit/71c3361342a600593997f413ff3d744b4f2b02cf

Reino17 commented 3 years ago

I can confirm that...

$ xidel -s --compressed "https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow" -e '$json'

...works. Thanks.

But where as curl, without --compressed, puts out the warning...

Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.

...xidel, without --compressed, happily returns the JSON. Is this intended?
Kinda defeats the purpose of --compressed if xidel already adds the header automatically. Or do I misunderstand?

benibela commented 3 years ago

...xidel, without --compressed, happily returns the JSON. Is this intended? Kinda defeats the purpose of --compressed if xidel already adds the header automatically. Or do I misunderstand?

Xidel does not add the header

The server does not care if the header is there or not, and always sends the same

Baltazar500 commented 3 years ago

I confirm. The header "Accept-Encoding: gzip" is added (I checked it via Wireshark on http sites) :) But requests to api.stackexchange.com fail (build 20210620.7908.1f19357dcdf2 - windows, android and debian versions).

Under windows output:

C:\Users\Admin>xidel -s --compressed "https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow" -e "$json" Error: Internet Error: -3 Failed to create connection. when talking to: https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow

under linux output:

xidel-0.9.9.20210620.7908.1f19357dcdf2 --compressed -s "https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow" -e '$json' Error: Internet Error: -2 OpenSSL version is too old for certificate checking. Required is OpenSSL 1.0.2+ when talking to: https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow

Under androidarm the same thing. There OpenSSL 1.0.2n 7 Dec 2017.

On my linux OpenSSL 1.0.1t 3 May 2016 (old system)

p.s. The archive xidel-0.9.9.20210620.7908.1f19357dcdf2.linux32.tar.gz is broken :(

benibela commented 3 years ago

Internet Error: -3 Failed to create connection.

You probably need to enable TLS1.2 in the Internet Explorer Settings or the registry under SCHANNEL

Under androidarm the same thing. There OpenSSL 1.0.2n 7 Dec 2017.

That is supposed to work. Perhaps it is still too old. Does it have the necessary functions?

 # strings /system/lib/libcrypto.so  | grep X509_VERIFY_PARAM_set_hostflags                                                                    
X509_VERIFY_PARAM_set_hostflags
 # strings /system/lib/libcrypto.so  | grep X509_VERIFY_PARAM_set1_host                                                                        
X509_VERIFY_PARAM_set1_host
 # strings /system/lib/libssl.so | grep SSL_get0_param                                                                                         
SSL_get0_param

On my linux OpenSSL 1.0.1t 3 May 2016 (old system)

That is expected. I did not want to support it, because it does not validate the certifiactes on its own properly, and it is rather cumbersome to implement my own validation: https://wiki.openssl.org/index.php/Hostname_validation

p.s. The archive xidel-0.9.9.20210620.7908.1f19357dcdf2.linux32.tar.gz is broken :(

Do not know how that could happen

Baltazar500 commented 3 years ago

Under linux :

root@antiX1:~# strings /system/lib/libcrypto.so | grep X509_VERIFY_PARAM_set_hostflags strings: '/system/lib/libcrypto.so': No such file root@antiX1:~# strings /system/lib/libcrypto.so | grep X509_VERIFY_PARAM_set1_host strings: '/system/lib/libcrypto.so': No such file root@antiX1:~# strings /system/lib/libssl.so | grep SSL_get0_param strings: '/system/lib/libssl.so': No such file

Under androidarm(64) (android 4.2.2 + armv7 & android 5.1 + aarch64) :

-bash-4.4# strings /system/lib/libcrypto.so | grep X509_VERIFY_PARAM_set_hostflags -bash-4.4# strings /system/lib/libcrypto.so | grep X509_VERIFY_PARAM_set1_host -bash-4.4# strings /system/lib/libssl.so | grep SSL_get0_param

:(

Under windows after tweak (on IE TLS 1.2 is enable)

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Client] "DisabledByDefault"=dword:00000000

and reboot, i get the same error :

C:\Users\Admin>xidel --compressed "https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow" -se "$json" Error: Internet Error: -3 Failed to create connection. when talking to: https://api.stackexchange.com/2.2/users/1501222?site=stackoverflow

On other https sites, I randomly gettin either a title or an error

C:\Users\Admin>xidel.exe -s "https://rutracker.org" -e "//title" RuTracker.org

C:\Users\Admin>xidel.exe -s "https://rutracker.org" -e "//title" RuTracker.org

C:\Users\Admin>xidel.exe -s "https://rutracker.org" -e "//title" RuTracker.org

C:\Users\Admin>xidel.exe -s "https://rutracker.org" -e "//title" RuTracker.org

C:\Users\Admin>xidel.exe -s "https://rutracker.org" -e "//title" Error: Internet Error: -3 Failed to create connection. when talking to: https://rutracker.org/

C:\Users\Admin>xidel.exe -s "https://rutracker.org" -e "//title" Error: Internet Error: -3 Failed to create connection. when talking to: https://rutracker.org/

C:\Users\Admin>xidel.exe -s "https://rutracker.org" -e "//title" RuTracker.org

Is it possible to make an insecure connection like in curl with the "-k" (--insecure) switch?

Reino17 commented 3 years ago

Under androidarm(64) (android 4.2.2 + armv7 & android 5.1 + aarch64) :

OpenSSL 1.0.2n should contain the necessary functions. At least the Windows libraries do.

On other https sites, I randomly gettin either a title or an error

Have you tried the Windows OpenSSL binary? It could be more reliable than the Window API.

Is it possible to make an insecure connection like in curl with the "-k" (--insecure) switch?

--no-check-certificate works for the OpenSSL binary. Not sure if it works for the normal binary.

Baltazar500 commented 3 years ago

Have you tried the Windows OpenSSL binary? It could be more reliable than the Window API.

Not. Win32 version only.

OpenSSL 1.0.2n should contain the necessary functions. At least the Windows libraries do.

OpenSSL 1.0.2n is used under androidarm (64) (android 4.2.2 + armv7 & android 5.1 + aarch64) and taken from entware. Probably the libraries have links to /opt/lib there.

For windows, I just used the win32 version, not SSL. Replacing xidel with win32.openssl and updating openssl to 1.1.1k https://curl.se/windows/ (my previous version is OpenSSL 1.0.2a 19 Mar 2015). I got the error "Failed to load CA files" :-D After copying the cacert.pem to the xidel directory, I got the data :)))

--no-check-certificate works for the OpenSSL binary. Not sure if it works for the normal binary.

This worked for androidarm, androidarm64, debian and win32.openssl versions. It didn't work for the usual win32 version :/ For android/linux I created a wrapper :

xidel-0.9.9.20210620.7908.1f19357dcdf2 --no-check-certificate "$@"

@Reino17, thanks for the help ;)

Reino17 commented 3 years ago

updating openssl to 1.1.1k https://curl.se/windows/

All xidel needs from that archive is 'libcrypto-1_1.dll' and 'libssl-1_1.dll' (@benibela, I think it's time to add this message on your website for the Windows OpenSSL binary). Alternatively you can grab them from my website.

After copying the cacert.pem to the xidel directory, I got the data

That, or use --ca-certificate="C:\some\other\map\cacert.pem"

I created a wrapper

You might be interested in the environment variable XIDEL_OPTIONS as mentioned in the readme. For example:

export XIDEL_OPTIONS='--silent --no-check-certificate'

Internet Error: -3 Failed to create connection.

What Windows OS are you using?
@benibela, just interested, does this error message always indicate an error with the Windows API, or could this possibly be a xidel error?

Baltazar500 commented 3 years ago

All xidel needs from that archive is 'libcrypto-1_1.dll' and 'libssl-1_1.dll'

OK. I use openssl sometimes. I need openssl.exe too ;)

That, or use --ca-certificate="C:\some\other\map\cacert.pem"

The directory where xidel is located is written in the PATH variable ;)

You might be interested in the environment variable XIDEL_OPTIONS as mentioned in the readme. For example:

Thank you, did not know about this. I will use it in the future. Now xidel is used in many scripts and it will take a very long time to rewrite them to specify the "XIDEL_OPTIONS" variable, so the wrapper is the simplest solution :)

What Windows OS are you using?

Windows 7 x64 SP1

Reino17 commented 3 years ago

The directory where xidel is located is written in the PATH variable ;)

But xidel doesn't look for 'cacert.pem' in %PATH%. If 'cacert.pem' is not in the same dir as xidel, then you'd always have to use --ca-certificate.

benibela commented 3 years ago

All xidel needs from that archive is 'libcrypto-1_1.dll' and 'libssl-1_1.dll' (@benibela, I think it's time to add this message on your website for the Windows OpenSSL binary). Alternatively you can grab them from my website.

A message that it needs openssl? That should be obvious

When I ever make the next stable release, I could just include the dlls

@benibela, just interested, does this error message always indicate an error with the Windows API, or could this possibly be a xidel error?

It always means it could not establish a connection with the server

And Xidel just gives the URL and stuff to Windows. There is hardly anything that could be wrong

But xidel doesn't look for 'cacert.pem' in %PATH%. If 'cacert.pem' is not in the same dir as xidel, then you'd always have to use --ca-certificate.

Or it uses %SSL_CERT_FILE%

Reino17 commented 3 years ago

A message that it needs openssl? That should be obvious

I guess you're right. Forget about it then. I was just trying to say that for a Windows user it's not necessary to install the entire program. Only the two dlls is enough.

It always means it could not establish a connection with the server

That would mean the OpenSSL binary isn't only useful for Windows XP, but for Windows 7 as well. Interesting.

Baltazar500 commented 3 years ago

When compression is turned on, I can not receive data from YouTube

xidel -H 'Accept-Encoding: gzip, deflate' -se "//title" "https://www.youtube.com" xidel --compressed -se "//title" "https://www.youtube.com"

When you turn off the compression, the data is received successfully

xidel -se "//title" "https://www.youtube.com" YouTube

Also, when extracting json from YouTube with compression enabled, I get an error

json is deprecated. Use json-doc or parse-json functions.

xidel --version Xidel 0.9.9 (20210620.7908.1f19357dcdf2)

benibela commented 3 years ago

Xidel needs ten bytes to detect the compression

Somehow it only receives one byte. The next byte arrives later, and then another one, .. very odd


> 
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 3209
> received bytes: 1
> received bytes: 10157
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2263
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 595
> received bytes: 140
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 3240
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 7568
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 831
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 5
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 4463
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2545
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2350
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2728
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2652
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 3244
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2523
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2536
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2588
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2763
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2610
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2398
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2704
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 5743
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 3020
> received bytes: 1
> received bytes: 1
> received bytes: 1
> received bytes: 2257
> received bytes: 10

Also, when extracting json from YouTube with compression enabled, I get an error

json is deprecated. Use json-doc or parse-json functions.

that is a warning

benibela commented 3 years ago

@Baltazar500 fixed: https://github.com/benibela/internettools/commit/a9a1b5cf1810aec5f3e7ee02120276d5780e94b5

Baltazar500 commented 2 years ago

A title or an error is randomly displayed in the output :(

bash-4.4# xidel --compressed -se "//title" "https://www.youtube.com"
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6ffffffe arg 0x810
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6fffffff arg 0x2
An unhandled exception occurred at $0000005583D45A3C:
Edecompressionerror: data error
  $0000005583D45A3C  WRITECOMPRESSEDBLOCK,  line 121 of ../../../components/pascal/internet/internetaccess_inflater_paszlib.pas
  $0000005583BBF160  WRITE,  line 162 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $0000005583D74200  WRITESTRTOSTREAM,  line 1817 of ../../../components/pascal/import/synapse/synautil.pas
  $0000005583D6A9C0  RECVSTREAMSIZE,  line 2547 of ../../../components/pascal/import/synapse/blcksock.pas
  $0000005583D66FD8  READIDENTITY,  line 740 of ../../../components/pascal/import/synapse/httpsend.pas
  $0000005583D671FC  READCHUNKED,  line 764 of ../../../components/pascal/import/synapse/httpsend.pas
  $0000005583D66D10  HTTPMETHOD,  line 705 of ../../../components/pascal/import/synapse/httpsend.pas
  $0000005583BBF874  INITANDHTTPMETHOD,  line 241 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $0000005583BBF70C  DOTRANSFERUNCHECKED,  line 280 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $0000005583B784E8  DOTRANSFERCHECKED,  line 1073 of ../../../components/pascal/internet/internetaccess.pas
  $0000005583B78120  REQUEST,  line 1017 of ../../../components/pascal/internet/internetaccess.pas
  $0000005583B77FF0  REQUEST,  line 996 of ../../../components/pascal/internet/internetaccess.pas
  $0000005583B77EDC  REQUEST,  line 988 of ../../../components/pascal/internet/internetaccess.pas
  $0000005583B7B948  GET,  line 1537 of ../../../components/pascal/internet/internetaccess.pas
  $0000005583B7C140  HTTPREQUEST,  line 1653 of ../../../components/pascal/internet/internetaccess.pas
  $0000005583B594D4  RETRIEVE,  line 56 of xidel.pas
  $0000005583B9BD14  RETRIEVE,  line 1117 of xidelbase.pas

bash-4.4# xidel --compressed -se "//title" "https://www.youtube.com"
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6ffffffe arg 0x810
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6fffffff arg 0x2
An unhandled exception occurred at $00000055578EBA3C:
Edecompressionerror: data error
  $00000055578EBA3C  WRITECOMPRESSEDBLOCK,  line 121 of ../../../components/pascal/internet/internetaccess_inflater_paszlib.pas
  $0000005557765160  WRITE,  line 162 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $000000555791A200  WRITESTRTOSTREAM,  line 1817 of ../../../components/pascal/import/synapse/synautil.pas
  $00000055579109C0  RECVSTREAMSIZE,  line 2547 of ../../../components/pascal/import/synapse/blcksock.pas
  $000000555790CFD8  READIDENTITY,  line 740 of ../../../components/pascal/import/synapse/httpsend.pas
  $000000555790D1FC  READCHUNKED,  line 764 of ../../../components/pascal/import/synapse/httpsend.pas
  $000000555790CD10  HTTPMETHOD,  line 705 of ../../../components/pascal/import/synapse/httpsend.pas
  $0000005557765874  INITANDHTTPMETHOD,  line 241 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $000000555776570C  DOTRANSFERUNCHECKED,  line 280 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $000000555771E4E8  DOTRANSFERCHECKED,  line 1073 of ../../../components/pascal/internet/internetaccess.pas
  $000000555771E120  REQUEST,  line 1017 of ../../../components/pascal/internet/internetaccess.pas
  $000000555771DFF0  REQUEST,  line 996 of ../../../components/pascal/internet/internetaccess.pas
  $000000555771DEDC  REQUEST,  line 988 of ../../../components/pascal/internet/internetaccess.pas
  $0000005557721948  GET,  line 1537 of ../../../components/pascal/internet/internetaccess.pas
  $0000005557722140  HTTPREQUEST,  line 1653 of ../../../components/pascal/internet/internetaccess.pas
  $00000055576FF4D4  RETRIEVE,  line 56 of xidel.pas
  $0000005557741D14  RETRIEVE,  line 1117 of xidelbase.pas

bash-4.4# xidel --compressed -se "//title" "https://www.youtube.com"
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6ffffffe arg 0x810
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6fffffff arg 0x2
YouTube
bash-4.4# xidel -H 'Accept-Encoding: gzip, deflate' -se "//title" "https://www.youtube.com"
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6ffffffe arg 0x810
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6fffffff arg 0x2
An unhandled exception occurred at $000000556DE13A3C:
Edecompressionerror: data error
  $000000556DE13A3C  WRITECOMPRESSEDBLOCK,  line 121 of ../../../components/pascal/internet/internetaccess_inflater_paszlib.pas
  $000000556DC8D160  WRITE,  line 162 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $000000556DE42200  WRITESTRTOSTREAM,  line 1817 of ../../../components/pascal/import/synapse/synautil.pas
  $000000556DE389C0  RECVSTREAMSIZE,  line 2547 of ../../../components/pascal/import/synapse/blcksock.pas
  $000000556DE34FD8  READIDENTITY,  line 740 of ../../../components/pascal/import/synapse/httpsend.pas
  $000000556DE351FC  READCHUNKED,  line 764 of ../../../components/pascal/import/synapse/httpsend.pas
  $000000556DE34D10  HTTPMETHOD,  line 705 of ../../../components/pascal/import/synapse/httpsend.pas
  $000000556DC8D874  INITANDHTTPMETHOD,  line 241 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $000000556DC8D70C  DOTRANSFERUNCHECKED,  line 280 of ../../../components/pascal/internet/synapseinternetaccess.pas
  $000000556DC464E8  DOTRANSFERCHECKED,  line 1073 of ../../../components/pascal/internet/internetaccess.pas
  $000000556DC46120  REQUEST,  line 1017 of ../../../components/pascal/internet/internetaccess.pas
  $000000556DC45FF0  REQUEST,  line 996 of ../../../components/pascal/internet/internetaccess.pas
  $000000556DC45EDC  REQUEST,  line 988 of ../../../components/pascal/internet/internetaccess.pas
  $000000556DC49948  GET,  line 1537 of ../../../components/pascal/internet/internetaccess.pas
  $000000556DC4A140  HTTPREQUEST,  line 1653 of ../../../components/pascal/internet/internetaccess.pas
  $000000556DC274D4  RETRIEVE,  line 56 of xidel.pas
  $000000556DC69D14  RETRIEVE,  line 1117 of xidelbase.pas

bash-4.4# xidel -H 'Accept-Encoding: gzip, deflate' -se "//title" "https://www.youtube.com"
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6ffffffe arg 0x810
WARNING: linker: /data/tools/bin/xidel-0.9.9.20211123.8232.023d1f1f656e.androidarm64: unused DT entry: type 0x6fffffff arg 0x2
YouTube
bash-4.4# 

Xidel 0.9.9 (20211123.8232.023d1f1f656e)

benibela commented 2 years ago

@Baltazar500 Then it cannot decompress the data it has received

That is impossible to debug without knowing what the received data was

In the new build I have added an environment option to log the data: XIDEL_DEBUG_DECOMPRESSION=true xidel --compressed -se "//title" "https://www.youtube.com"

Baltazar500 commented 2 years ago

I updated to xidel-0.9.9.20211214.8268.ffad46040257.androidarm64 and made a "compressed" request. Log in attachment. XIDEL_DEBUG_DECOMPRESSION.log

benibela commented 2 years ago

Looks like the decompressor fails when it encounters incomplete data. Guess I will have to revert it to the original patch (download all data into memory and then compress it rather than doing it during the download)

But I messed the debugging option up. It is not printing the received data, but some unrelated data

You could try it again with the build from today

Baltazar500 commented 2 years ago

You could try it again with the build from today

Done. Log in attachment. Sorry for the delay

xidel-0.9.9.20211225.8285.b90e197a0a24.androidarm64.DEBUG_DECOMPRESSION.log

benibela commented 2 years ago

That is extremely odd. Everything works perfectly on my system.

Here is a zip with more things to try.

A Xidel version that dumps some internal state of the gzip decoder

I made a tool zstreamtest to only run the gzip decoding on a log file (from 1F on until the exception). It prints the internal state to stderr and the decoded site to stdout. It always get the correct <!DOCTYPE html> .. with it on your log file

tmp.zip

Baltazar500 commented 2 years ago

That is extremely odd. Everything works perfectly on my system.

OK. I've done tests. I only got the error 2 out of 10 times. Less than before. Failed request logs in the attachment

XIDEL_DEBUG_DECOMPRESSION-err-2021.12.31_10.41.35.log XIDEL_DEBUG_DECOMPRESSION-out-2021.12.31_10.41.35.log

I made a tool zstreamtest to only run the gzip decoding on a log file (from 1F on until the exception). It prints the internal state to stderr and the decoded site to stdout. It always get the correct <!DOCTYPE html> .. with it on your log file

This utility is executed from under xidel? Or do I need to execute some other command?


ls -l ./
-rwx------    1 root     root      10262400 Dec 31 10:29 xidel
-rwx------    1 root     root       1161368 Dec 31 10:29 zstreamtest
-bash-4.4# ./xidel --version
WARNING: linker: ./xidel: unused DT entry: type 0x6ffffffe arg 0x810
WARNING: linker: ./xidel: unused DT entry: type 0x6fffffff arg 0x2
Xidel 0.9.9

http://www.videlibri.de/xidel.html
by Benito van der Zander <benito@benibela.de>

export XIDEL_DEBUG_DECOMPRESSION=true; ./xidel --user-agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0' --no-check-certificate --compressed -se "//title" "https://www.youtube.com" 1> $dload/xidel-log/XIDEL_DEBUG_DECOMPRESSION-out-`stamp`.log 2> $dload/xidel-log/XIDEL_DEBUG_DECOMPRESSION-err-`stamp`.log
benibela commented 2 years ago

I have fixed it. There was no point in looking at the internal state of the decompressor, I was just calling on stale data. That is almost embarrassing

OK. I've done tests. I only got the error 2 out of 10 times.

It is surprising that it ever worked at all without crashing

This utility is executed from under xidel? Or do I need to execute some other command?

You would remove the first line from the log, so that it starts with 1F. Then you call ./zstreamtest XIDEL_DEBUG_DECOMPRESSION-out-2021.12.31_10.41.35.log

Baltazar500 commented 2 years ago

Done. Here is the log. zstreamtest.log