benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
681 stars 42 forks source link

xidel openssl error messages on androidarm64 (termux) #30

Closed iegubkin closed 3 years ago

iegubkin commented 5 years ago

xidel produces the correct result but also outputs a bunch of error messages. This started to happen after termux's openssl update to 1.1.1b-3. Before this openssl upgrade, there were no error messages.

termux 0.68

Xidel 0.9.9 (20190104.6739.b64562007cb7) xidel-0.9.9.20190104.6739.b64562007cb7.androidarm64

openssl 1.1.1b-3 aarch64

$ xidel --data="https://feeds.twit.tv/twig.xml" --extract='head(//rss/channel/item/enclosure/@url)' Retrieving (GET): https://feeds.twit.tv/twig.xml Processing: https://feeds.twit.tv/twig.xml https://www.podtrac.com/pts/redirect.mp3/cdn.twit.tv/audio/twig/twig0505/twig0505.mp3 An unhandled exception occurred at $0000007BE68C0200: EAccessViolation: Access violation $0000007BE68C0200 $0000007BE7B0BC24 $0000007BE7AFD9E4 $0000007BE7AF76E4 $0000007BE7AF74BC $0000007BE7AF3130 $0000007BE78F51BC $00000057FA6F8294 $00000057FA53CAE8 $00000057FA6F80B0 FREELIBRARY, line 114 of ../../../components/pascal/import/synapse/synafpc.pas $00000057FA6FFA78 DESTROYSSLINTERFACE, line 2097 of ../../../components/pascal/import/synapse/ssl_openssl_lib.pas $00000057FA6FFFB4 SSL_OPENSSLLIB$$_finalize$, line 2223 of ../../../components/pascal/import/synapse/ssl_openssl_lib.pas $00000057FA539E98 $00000057FA53A26C $00000057FA53A28C $00000057FA529ED4 main, line 98 of xidel.pas $0000007BE79C2E00

Also:

$ curl -s -L https://feeds.twit.tv/twig.xml | xidel - --extract='head(//rss/channel/item/enclosure/@url)' Processing: stdin:/// https://www.podtrac.com/pts/redirect.mp3/cdn.twit.tv/audio/twig/twig0505/twig0505.mp3

Thanks for this powerful tool! I use it more and more.

benibela commented 5 years ago

That is the unloading of libcrypto

Can you make a backtrace with gdb?

zpimp commented 4 years ago

works here

curl -k -L https://feeds.twit.tv/twig.xml -o tw.xml
./xidel tw.xml -e 'head(//rss/channel/item/enclosure/@url)'
**** Retrieving: tw.xml ****
**** Processing: tw.xml ****
https://media.blubrry.com/34874/cdn.twit.tv/audio/twig/twig0534/twig0534.mp3

edit: if i try your first command i get this

./xidel --data="https://feeds.twit.tv/twig.xml" --extract='head(//rss/channel/item/enclosure/@url)'
**** Retrieving (GET): https://feeds.twit.tv/twig.xml ****
Error:
Internet Error: -4
when talking to: https://feeds.twit.tv/twig.xml
iegubkin commented 3 years ago

Finally worked out a solution to xidel (Xidel 0.9.9, 20210309.7795.c4e17e2d216c) establishing a SSL/TLS connection on termux (aarch64, version 0.112)

The standard command still fails throwing OpenSSL errors:

xidel --data="https://feeds.twit.tv/twig.xml" --extract='head(//rss/channel/item/enclosure/@url)'
**** Retrieving (GET): https://feeds.twit.tv/twig.xml ****
Error:
Internet Error: -3 
HTTPS connection failed after connecting to server. Some possible causes: handshake failure, mismatched HTTPS version/ciphers, invalid certificate
OpenSSL-Error: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
OpenSSL information: CA file:  , CA dir: /system/etc/security/cacerts , TLSv1.2, BoringSSL
when talking to: https://feeds.twit.tv/twig.xml

but with a few modifications to adjust to the peculiarities of the termux environment:

LD_LIBRARY_PATH=$PREFIX/lib xidel --data="https://feeds.twit.tv/twig.xml" --ca-directory=$PREFIX/etc/tls/certs --ca-certificate=$PREFIX/etc/tls/cert.pem --extract='head(//rss/channel/item/enclosure/@url)'
**** Retrieving (GET): https://feeds.twit.tv/twig.xml ****
**** Processing: https://feeds.twit.tv/twig.xml ****
https://pdst.fm/e/chtbl.com/track/E91833/cdn.twit.tv/megaphone/twig_609/TWI1054991812.mp3

Success! I think "--ca-directory=$PREFIX/etc/tls/certs" is only necessary if trusted certificates are added. Otherwise the following shorter command will work.

LD_LIBRARY_PATH=$PREFIX/lib xidel --data="https://feeds.twit.tv/twig.xml" --ca-certificate=$PREFIX/etc/tls/cert.pem --extract='head(//rss/channel/item/enclosure/@url)'
Reino17 commented 3 years ago

You may find xidel's environment variable usefull.
_'http://videlibri.sourceforge.net/xidel_readme.txt':_

The environment variable XIDEL_OPTIONS can be used to set Xidel's default options, for example
XIDEL_OPTIONS="--silent --color=never"
to disable some output and coloring.

That way you could simplify your command to:

LD_LIBRARY_PATH=$PREFIX/lib
XIDEL_OPTIONS="--ca-certificate=$PREFIX/etc/tls/cert.pem"
xidel -s "https://feeds.twit.tv/twig.xml" -e 'head(//enclosure/@url)'
benibela commented 3 years ago

Great.

What is PREFIX? I could add /etc/tls/cert.pem to the default ca search paths.

 OpenSSL information: CA file:  , CA dir: /system/etc/security/cacerts , TLSv1.2, BoringSSL

/system/etc/security/cacerts is in the default search paths, is that wrong?

That way you could simplify your command to: XIDEL_OPTIONS="--ca-certificate=$PREFIX/etc/tls/cert.pem"

That probably needs to be export XIDEL_OPTIONS="--ca-certificate=$PREFIX/etc/tls/cert.pem" or it creates a bash internal variable

iegubkin commented 3 years ago

The environment variable "$PREFIX" is a termux-ism:

Termux is neither a virtual machine nor any other kind of emulated or simulated environment. All provided packages are cross-compiled with Android NDK and only have compatibility patches to get them working on Android. The operating system does not provide full access to its file systems, so Termux cannot install package files into standard directories such as /bin, /etc, /usr or /var. Instead, all files are installed into the private application directory located at

/data/data/com.termux/files/usr

We call that directory "prefix" and usually refer to it as "$PREFIX" which also an exported environment variable in the Termux shell.

$ printenv | grep PREFIX
PREFIX=/data/data/com.termux/files/usr

Packages compiled for termux have this path hard encoded:

$ curl --verbose https://www.github.com |& grep "CAfile\|CApath"
*  CAfile: /data/data/com.termux/files/usr/etc/tls/cert.pem
*  CApath: /data/data/com.termux/files/usr/etc/tls/certs
benibela commented 3 years ago

I have added checks for PREFIX:

https://sourceforge.net/p/videlibri/code/ci/f1240a9ba35797d3fb655f1a60005ba111e89515/tree/components/pascal/internet/internetaccess.pas?diff=bfba9c84d5b3427999943c3c0e58e951c3ba46cf

https://sourceforge.net/p/videlibri/code/ci/58a15ff977a3b2ffe57660a005c9e88bbb888ea3/tree/components/pascal/import/synapse/ssl_openssl_lib.pas?diff=f1240a9ba35797d3fb655f1a60005ba111e89515

iegubkin commented 3 years ago

A new finding:

Setting an environmental variable LD_LIBRARY_PATH can break other programs, like mpv, in termux (Android 7+)

$ export XIDEL_OPTIONS="--ca-certificate=$PREFIX/etc/tls/cert.pem"

$ xidel -s "https://feeds.twit.tv/twig.xml" -e 'head(//enclosure/@url)'
Error:
Internet Error: -3 
HTTPS connection failed after connecting to server. Some possible causes: handshake failure, mismatched HTTPS version/ciphers, invalid certificate
OpenSSL-Error: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
OpenSSL information: CA file: /data/data/com.termux/files/usr/etc/tls/cert.pem , CA dir: /system/etc/security/cacerts , TLSv1.2, BoringSSL
when talking to: https://feeds.twit.tv/twig.xml

$ export LD_LIBRARY_PATH=$PREFIX/lib

$ xidel -s "https://feeds.twit.tv/twig.xml" -e 'head(//enclosure/@url)'
https://pdst.fm/e/chtbl.com/track/E91833/cdn.twit.tv/megaphone/twig_609/TWI1054991812.mp3

$ mpv --version
CANNOT LINK EXECUTABLE "mpv": cannot locate symbol "XzUnpacker_Construct" referenced by "/system/lib64/libunwind.so"...
Aborted

$ unset LD_LIBRARY_PATH

$ mpv --version
mpv 0.33.1 Copyright © 2000-2020 mpv/MPlayer/mplayer2 projects
 built on Tue May  4 19:14:46 UTC 2021
FFmpeg library versions:
   libavutil       56.70.100
   libavcodec      58.134.100
   libavformat     58.76.100
   libswscale      5.9.100
   libavfilter     7.110.100
   libswresample   3.9.100
FFmpeg version: 4.4

$ LD_LIBRARY_PATH=$PREFIX/lib xidel -s "https://feeds.twit.tv/twig.xml" -e 'head(//enclosure/@url)'
https://pdst.fm/e/chtbl.com/track/E91833/cdn.twit.tv/megaphone/twig_609/TWI1054991812.mp3

So it seems in termux LD_LIBRARY_PATH=$PREFIX/lib xidel is the preferred and safer way to go?