benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
674 stars 43 forks source link

android binary #2

Closed zpimp closed 7 years ago

zpimp commented 8 years ago

hello there can you please provide an android binary? nothing special just plain arm self contained binary i may have sent you a message before, im not sure i have busybox, aria2 and perl all i need is xidel :) thank you for your work. it really does wonders cheers

benibela commented 8 years ago

Try this one: https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%200.9.4/xidel-0.9.4.androidarm.tar.gz/download

zpimp commented 8 years ago

i flashed, and i get the pie error "only position independent executables (pie) are supported"

encountered this before but im not a coder

if i remember xidel is written in pascal/lazarus found this relating crosscompile for android with pie http://forum.lazarus.freepascal.org/index.php?topic=30546.0

edit: it seems the compiler needs -k-pie to pass parameter to ld (linker) i would try it but i dont have the env set up, and wouldnt even know where to start and would bother you with loads of noob questions :) anyway thank you for your time trying to help

2nd edit: tried on android 4.4 it works, didnt test xpath, just --help but it seems there is a delay even if i just run "time xidel --version" 0m2.98 real 0m2.82s user 0m0.15s system

benibela commented 8 years ago

Better add comments instead edits, so github sends notifications

Do you have valgrind installed? Xidel compiled with -gv, then valgrind --tool=callgrind xidel ... is a great way to detect slow parts. (kcachegrind to view the output)

zpimp commented 8 years ago

i dont have valgrind, only gcc cause that comes with linux :) not much experience with compiling from source

didnt compile anything myself, just tested the binaries you posted what i was trying to say is that xidel runs fine on x86 but not on arm dont have time right now to set up compilers, will be away this weekend

will probably try next week, but if you post more binaries i can test right on the phone, dont need computer for that

so valgrind let me test the binary for arm on x86 and tells me whats wrong ? do i need emulator, android sdk or something ?

thank you

benibela commented 8 years ago

didnt compile anything myself, just tested the binaries you posted

so they work after all?

will probably try next week, but if you post more binaries i can test right on the phone, dont need computer for that

I will make new ones this weekend

But it will probably be called Xidel 0.9.5

so valgrind let me test the binary for arm on x86 and tells me whats wrong ? do i need emulator, android sdk or something ?

Valgrind runs a program and marks the parts that are slow

It actually is an emulator itself.

zpimp commented 8 years ago

it works on android 4.4 doesnt work on 5.1 but on 4.4 it takes 3 seconds to start, the time i posted waiting forward new versions

benibela commented 8 years ago

New builds will be here: https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%20development/

zpimp commented 8 years ago

flashed again on 5.1 same stuff,"only position independent executables (PIE) are supported" on 4.4 doesent even run anymore, terminal closees after about 6 seconds checked md5 is correct :(

benibela commented 8 years ago

Then the -Cg flag was not enough and -k-pie was needed as well

zpimp commented 8 years ago

http://imgur.com/7pK5vGN xidel has text relocations out of memory on android 5.1 edit: http://forum.lazarus.freepascal.org/index.php?topic=33238 i asked here, and they reccommended LAMW

zpimp commented 8 years ago

on android 4.4 terminal closes,

probably not related, http://forum.lazarus.freepascal.org/index.php/topic,29891.15.html these guys suggest laztoapk, for creating apk what caught my attention, was they also encounter apps that work on 4.4 but not on 5.1 in the end they conclude is about lazarus customdrawn, probably gui stuff, but from what i see theres no gui in xidel

benibela commented 8 years ago

http://imgur.com/7pK5vGN

Are the numbers after out of memory always the same?

I have uploaded a new version. No changes, but it has debug symbols, so the error might be more informative.

You can start it with gdb xidel or strace xidel to get more information

on android 4.4 terminal closes,

How could it close the terminal, even if it crashs?

On the emulator I get ./xidel: 1: Syntax error: word unexpected (expecting ")"), Android is a mess

lazarus customdrawn, probably gui stuff, but from what i see theres no gui in xidel

I am not using any part of Lazarus anymore

zpimp commented 8 years ago

http://imgur.com/r4SMRkW numbers are not the same on 5.1 error numbers only show first time, then it says only killed numbers appear again in error only if i restart

gdb xidel doesent work strace is not first run, after restart, i can get that too if it makes any difference http://pastebin.com/NsLhM7Ca

i dont know, on 4.4 it just crashes the terminal tried dmesg, logcat, no stuff about terminal

what are you using instead of lazarus?

android is indeed a mess, couldnt agree more, i just want and x86 phone with linux and classic bios/bootloader, like on desktop

will try this new version

zpimp commented 8 years ago

the new version throws error with numbers every time but the numbers are completely different every time http://imgur.com/INEKuYs on previous version only last 4bytes were different, first4 bytes were the same

benibela commented 8 years ago

gdb xidel doesent work

Do you have gdb installed? It is useful.

strace is not first run, after restart, i can get that too if it makes any difference

It says it actually runs out of memory

After demanding 1.3 gigabyte

While reading a /usr/share/zoneinfo/localtime or /usr/lib/zoneinfo/localtime file. Do you have those? What is their content?

what are you using instead of lazarus?

FreePascal alone

android is indeed a mess, couldnt agree more, i just want and x86 phone with linux and classic bios/bootloader, like on desktop

Like an Ubuntu phone

benibela commented 8 years ago

While reading a /usr/share/zoneinfo/localtime or /usr/lib/zoneinfo/localtime file. Do you have those? What is their content?

Actually the log says, you do not have them and it crashed somewhere else after it failed loading them

benibela commented 8 years ago

but the numbers are completely different every time

It adds a random number every time, so it is harder to hack

strace should show that random offset. Call strace xidel and post the strace output and xidel's backtrace numbers for the same run

zpimp commented 8 years ago

dont have gdb,is this it? http://dan.drown.org/android/howto/gdb.html

i dont have either of those files i have a custom rom/os on my phones, based on cyanogenmod but everything else works

i assume you meant to post xidel output first time after restart, if thats the case, this is it, 740kb, too big for pastebin http://www.2shared.com/file/opoStB6Z/xi2.html

didnt realised, lazarus is just an ide

dont have experience with ubuntu phone, is it more like linux ? read about an ubuntu, more like android

benibela commented 8 years ago

dont have gdb,is this it? http://dan.drown.org/android/howto/gdb.html

Yes

i assume you meant to post xidel output first time after restart, if thats the case, this is it, 740kb, too big for pastebin http://www.2shared.com/file/opoStB6Z/xi2.html

Yes, but no restart was needed.

Now we see in line 192, after 1.3 GB, it wants 3.9 GB which triggers the out of memory

Stil some things are truncated (like xid in line 2448), perhaps it helps to get more data with

  strace -i -e read=3 xidel

dont have experience with ubuntu phone, is it more like linux ? read about an ubuntu, more like android

I do not know it

But I would think it is like Ubuntu, when it is called Ubuntu Phone

benibela commented 8 years ago

Just the logcat output might also be interesting

zpimp commented 8 years ago

strace -i -e read=3 xidel http://pastebin.com/avbrcWue

logcat http://pastebin.com/kHUa7dBz

zpimp commented 8 years ago

any news? thinking about a way to upload stuff from android to process with xidel from linux x86 would probably be better to process on android even if slower, than to upload maybe a xpath as a service :) with an api

benibela commented 8 years ago

strace -i -e read=3 xidel http://pastebin.com/avbrcWue

That is now missing the "unhandled exception... " output

any news?

I was thinking about uploading a simple program that just prints a single line, compiled with different flags, to see if one of the works.

Unfortunately I do not really have any working Android devices, except an old phone I bought used for online banking on which I cannot run anything (especially not the online banking app).

maybe a xpath as a service :) with an api

I have an API

http://www.videlibri.de/cgi-bin/xidelcgi?raw=true&xpath=1%20to%20100

zpimp commented 8 years ago

anything you post i will test

what android phone do you have? and what else are you using, just curious im not really using the galaxy s3, if you want i could try send it to you, dont know about the cost

i wouldnt reccommend using android for banking app, i heard some token keys got stolen im using the external token even on android im using the banks site

i didnt have the time to really try your api, i usually do "xidel filename* -e ... " on a bunch of files so it would be kinda hard for lots of files

i was thinking more like a dedicated xidel pc, online 24/7 im not sure about security :) to upload a tar with the command in a separate file

also thinking about a technique for perl regex html, not very powerful, and a lot of work but a little faster than xidel since it has some abstraction overhead

zpimp commented 8 years ago

about c/c++ wich would probably be easier to port to android

apart from a complete rewrite, wich i assume is pretty close to imposible, and not happening considering its a big complex project

and auto conversion to c-code, http://ivan.vecerina.com/code/delphi2cpp/ for freepascal asuming it does exist and conversion would be far from working

it might be stupid, but how about using a xidel library called from a c/c++ front-end cli would that work?

i admit i dont know very much about programming, dll/so libraries, native code function calls :) just a thought

i realise now the problem is freepascal its difficult to get working on its own not counting external libraries

benibela commented 8 years ago

anything you post i will test

Same compiling flags as before, but just a hello world:

helloworld.zip

The only thing clear right now is that it is crashing during the initialization. Every Pascal unit initializes itself, before the actual program runs, and that crashes. Xidel has units from me and units from FPC

what android phone do you have? and what else are you using, just curious

An HTC Wildfire rooted with Cyanogenmod.

Otherwise I use a good old laptop. You have to carry one or two kg, but it can do all the things

im not really using the galaxy s3, if you want i could try send it to you, dont know about the cost

That might help. I got the Wildfire for like 10€ on ebay, international shipping might be more expensive than that.

GDB would help, too.

i wouldnt reccommend using android for banking app, i heard some token keys got stolen

Banking site + banking app

They it is double secure, as both devices need to be hacked.

about c/c++ wich would probably be easier to port to android

Not necessarily. Freepascal has the advantage that is mostly self contained

It just is not tested enough

and auto conversion to c-code, http://ivan.vecerina.com/code/delphi2cpp/

It would be hard to even port it from FreePascal to Delphi

it might be stupid, but how about using a xidel library called from a c/c++ front-end cli would that work?

I am actually using a part of Xidel as a library to be called from a Java apk and that works.

i realise now the problem is freepascal its difficult to get working on its own not counting external libraries

benibela commented 8 years ago

The output of

readelf -h /system/bin/linker

might be interesting, if you have readelf installed

zpimp commented 8 years ago

flashed helloworld on 5.1 still says it has text relocations but execution time is now 0.03sec :) sorry for the delay been kinda busy need anything else ? strace?

please link readelf and gdb or whatever you need output for, on android do i need computer for these 2?

benibela commented 8 years ago

still says it has text relocations

These relocation probably do not matter

The important thing is, does it print Hello World? Or some exception error?

please link readelf and gdb or whatever you need output for, on android

It should be in the Android-NDK: https://developer.android.com/ndk/downloads/index.html

This might have newer versions: https://termux.com/

With full gdb you call it on the smartphone:

    $ gdb xidel
    (gdb)  run
    (gdb) bt

Or you can get the smaller gdbserver, and then run the full gdb on the computer and connect to gdbserver:

     $ gdbserver 0.0.0.0:1234 xidel    # <- on the smartphone
                $ arm-linux-androideabi-gdb xidel               # <- on the computer in the same network
                (gdb) target remote  <smartphone ip>:1234
                (gdb) continue
                (gdb) bt
zpimp commented 8 years ago

yes it does print helloworld so ill get gdb from termux where do i get readelf?

benibela commented 8 years ago

yes it does print helloworld

So the compiler works fine.

Then the problem has to be in one of the libraries.

I made a hello world for each library used in Xidel: helloWorld.zip

Which one works, which one crashes? (each prints some different random nonsense, so you can keep them apart)

If you are curious, this is the order the libraries loaded by Xidel:

(gdb) x/50a @INITFINAL 
0x8ff1b0 <INITFINAL+32>:    0x424420 <LNFODWRF_$$_init> 0x424440 <LNFODWRF_$$_finalize>
0x8ff1c0 <INITFINAL+48>:    0x0 0x425b50 <OBJPAS_$$_finalize>
0x8ff1d0 <INITFINAL+64>:    0x548610 <UNIX_$$_init> 0x548620 <UNIX_$$_finalize>

0x8ff1e0 <INITFINAL+80>:    0x4bfc30 <SYSUTILS$_$TENCODING_$__$$_create>    0x4bfc60 <SYSUTILS$_$TENCODING_$__$$_destroy>
0x8ff1f0 <INITFINAL+96>:    0x4c8be0 <SYSUTILS_$$_init> 0x4c8c20 <SYSUTILS_$$_finalize>
0x8ff200 <INITFINAL+112>:   0x5455f0 <TYPINFO_$$_init_implicit> 0x545600 <TYPINFO_$$_finalize_implicit>
0x8ff210 <INITFINAL+128>:   0x4a68e0 <CLASSES_$$_init>  0x4a68f0 <CLASSES_$$_finalize>
0x8ff220 <INITFINAL+144>:   0x45c9d0 <BBUTILS_$$_init_implicit> 0x45c9e0 <BBUTILS_$$_finalize_implicit>
0x8ff230 <INITFINAL+160>:   0x55b980 <BIGDECIMALMATH_$$_init>   0x0
0x8ff240 <INITFINAL+176>:   0x42e380 <INTERNETACCESS_$$_init_implicit>  0x42e390 <INTERNETACCESS_$$_finalize>
0x8ff250 <INITFINAL+192>:   0x5f61d0 <DL_$$_init>   0x0
0x8ff260 <INITFINAL+208>:   0x5f61f0 <FLREUNICODE_$$_init_implicit> 0x5f6200 <FLREUNICODE_$$_finalize_implicit>
0x8ff270 <INITFINAL+224>:   0x5f6000 <FLRE_$$_init> 0x5f6020 <FLRE_$$_finalize>
0x8ff280 <INITFINAL+240>:   0x56cf60 <XQUERY__REGEX_$$_init>    0x56cfd0 <XQUERY__REGEX_$$_finalize>
0x8ff290 <INITFINAL+256>:   0x541360 <SIMPLEHTMLTREEPARSER_$$_init> 0x541b00 <SIMPLEHTMLTREEPARSER_$$_finalize>
0x8ff2a0 <INITFINAL+272>:   0x52fdd0 <XQUERY_$$_init>   0x5304a0 <XQUERY_$$_finalize>
0x8ff2b0 <INITFINAL+288>:   0x601240 <EXTENDEDHTMLPARSER_$$_init>   0x6012f0 <EXTENDEDHTMLPARSER_$$_finalize_implicit>
0x8ff2c0 <INITFINAL+304>:   0x606cc0 <JSONSCANNER_$$_init_implicit> 0x606cd0 <JSONSCANNER_$$_finalize_implicit>
0x8ff2d0 <INITFINAL+320>:   0x603f60 <XQUERY_JSON_$$_init>  0x604420 <XQUERY_JSON_$$_finalize>

0x8ff2e0 <INITFINAL+336>:   0x62b380 <NETDB_$$_init>    0x62b390 <NETDB_$$_finalize>
0x8ff2f0 <INITFINAL+352>:   0x624db0 <SYNSOCK_$$_init>  0x624e10 <SYNSOCK_$$_finalize>
0x8ff300 <INITFINAL+368>:   0x61dcc0 <SYNAUTIL_$$_init> 0x61dd50 <SYNAUTIL_$$_finalize_implicit>
0x8ff310 <INITFINAL+384>:   0x616030 <BLCKSOCK_$$_init> 0x6160d0 <BLCKSOCK_$$_finalize>

We know it gets to UNIX$$init as strace mention the time zones, and it crashed before it got to NETDB$$init, because otherwise strace would have shown access of the hosts file.

so ill get gdb from termux

Yeah, that should directly show the bug

where do i get readelf?

If it printed hello world that is not needed.

Readelf would have told us, if the linker is arm5 or arm7. I was making arm5 builds because they should work everywhere, but with these things you can never be certain

zpimp commented 8 years ago

sorry for the delay, been busy will test all when i get the chance

benibela commented 7 years ago

I just was told fpc 3.0 cannot generate working programs for Android 5+ (http://wiki.freepascal.org/Android#Known_issues)

I have uploaded a new [Xidel build with fpc 3.1.1](https://sourceforge.net/projects/videlibri/files/Xidel/Xidel development/) (r34554)

zpimp commented 7 years ago

hello i just checked this: https://heanet.dl.sourceforge.net/project/videlibri/Xidel/Xidel%20development/xidel-0.9.5.20160923.5109.84340b7790b9.androidarm.tar.gz

and it works, on android 5.1.1, dont have anything else to test on

a bit slow compare to linux 386 version

linux x86 ( time xidel -h ) real 0m0.054s user 0m0.050s sys 0m0.003s

android 5.1.1 ( time xidel -h ) real 0m1.38s user 0m0.99s sys 0m0.11s

linux x86 ( time xidel f -e "//*[@id='products-holder']/div/form/div[2]/h2/a/@href" ) real 0m0.153s user 0m0.143s sys 0m0.007s

android 5.1.1 ( time xidel f -e "//*[@id='products-holder']/div/form/div[2]/h2/a/@href" ) real 0m1.62s user 0m1.30s sys 0m0.07s

what could be the cause for this? arm being worst than x86 (i know it is but not this bad) android worst than linux (i believe this is it)

execution time when given work compared to simple showing help android is only 30% increase linux version 300% and its still 10 times better than android

on android there is termux wich makes it easier to install linux terminal utilities like in debian no root needed, just open terminal and type "apt install program" maybe we should get xidel in termux repos ?

thank you for your work

benibela commented 7 years ago

I just made a version for 0.9.6: https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%200.9.6/xidel-0.9.6.androidarm.tar.gz/download

what could be the cause for this?

Perhaps timing strace helps to find something slow:

  strace -tt -T xidel ... 

wich makes it easier to install linux terminal utilities like in debian no root needed, just open terminal and type "apt install program" maybe we should get xidel in termux repos ?

Maybe you can ask them directly? My last three attempts to get something in some repository have failed. Maintainers are busy, or want to compile it and have no Pascal compiler...

zpimp commented 7 years ago

just flashed and tested on --version i get an error http://imgur.com/5WmBg2Z

zpimp commented 7 years ago

i managed to select the text from terminal emulator

also it doesent work in termux, but i think this is their fault

i tried to contact termux on irc, no answer

also i think theres still a god thing you provide binaries

u0_a59@kccat6xx:/sdcard/download/code $ xidel --version Xidel 0.9.6 (20161214.5282.d3cccfbc2e2c)

http://www.videlibri.de/xidel.html by Benito van der Zander benito@benibela.de

An unhandled exception occurred at $B6E5D028: EAccessViolation: Access violation $B6E5D028 TXQUERYENGINE__DESTROY, line 6807 of /home/benito/hg/components/pascal/data/xquery.pas $B6DA7C28 $B6D94F00 main, line 56 of xidel.pas

217|u0_a59@kccat6xx:/sdcard/download/code $

zpimp commented 7 years ago

out4.txt

time strace -tt -T xidel > out4.txt 2>&1 1.6 mb

benibela commented 7 years ago

on --version i get an error http://imgur.com/5WmBg2Z

That is my modified memory management. Weird, it is not supposed to do anything on --version.

What do you get on xidel -e "garbage-collect()" or xidel -e "garbage-collect()" -e "garbage-collect()" -e "garbage-collect()"

Do you have gdb to get a longer backtrace?

also i think theres still a god thing you provide binaries

I always thought xidel is just like TempleOs

time strace -tt -T xidel > out4.txt 2>&1 1.6 mb

ui, it is loading /etc/hosts

How many lines does that have? On linux I have 10 lines. Why do you have so many?

zpimp commented 7 years ago

/etc/hosts protection for ads/malware, also use it on linux, not the same file though

its moab hosts from xda, this seems to be 170,000 lines, ~4.7mb cant you tell it to ignore hosts ? or at least an option parameter

in what way is xidel like templeos, isnt that the religious guy?

where to get gdb from tried from here -> https://dan.drown.org/android/howto/gdb.html got termux, but dont know how to run it, should i flash it like xidel

also installed gdb from termux, but "gdb xidel" doesent work, shows gdb help

any ideea?

benibela commented 7 years ago

its moab hosts from xda, this seems to be 170,000 lines, ~4.7mb cant you tell it to ignore hosts ? or at least an option parameter

That is fpc's net initialization code. I cannot change it.

I could write a bug report: http://bugs.freepascal.org/view.php?id=31129

Although it might also be possible to create a version that does not use fpc for internet access, but Apache HttpComponents from Android.

in what way is xidel like templeos, isnt that the religious guy?

Because I have spent so much time adding unnecessary, overcomplicated stuff to it...

also installed gdb from termux, but "gdb xidel" doesent work, >shows gdb help

Did you try what I described above: https://github.com/benibela/xidel/issues/2#issuecomment-236427738

zpimp commented 7 years ago

i dont really use the download part on xidel, probably used it 2-3 times, compared to probably tens of uses for extraction, for thousands of pages, dont know how other people use it

i use aria2c for mass downloading, it supports gzipped downloading with auto decompression for list of file, then i just xidel p* -e "..."

i mean its nice to have it, but i think the most important part of xidel is the xpath/css extraction for wich is the best cli tool i found

i also tend to overcomplicate stuff :) but i run away for everything that has java in it, and android native dont know about Apache HttpComponents from android, but i wouldnt put my money on it :)

so can you make a version with just the internet part commented out?

benibela commented 7 years ago

but i think the most important part of xidel is the xpath/css extraction

There are standard XPath function that depend on downloading, e.g. fn:doc

so can you make a version with just the internet part commented out?

Perhaps later

I am more concerned about the crash in 0.9.6. I could reproduce it in the emulator (although gdb kept crashing...) and it seems to be another freepascal bug (http://bugs.freepascal.org/view.php?id=31135)... But it is specific enough that I could upload a new 0.9.6 version without this crash.

zpimp commented 7 years ago

its pretty awkward if you ask me for xpath to depend on downloading since xpath is made for working with xml wich is not neccessarely html and can be generated locally by other apps

but every language/compiler must have its share of bad decisions :)

zpimp commented 7 years ago

i have tested xidel without hosts file and it seems to be fine also i was able to use xidel with termux, by copying file to /data/data/com.termux/files/usr/bin and setting chmod 777 xidel so this way you can use it without the need to flash or root device happy new year

zpimp commented 7 years ago

also tested on termux on android 6.0 not rooted on "time xidel --help" it said ~0.1 sec but it took more than 2 seconds i think it was android 64 bit, would that have anything to do with it?

zpimp commented 7 years ago

any chance to build an arm-linux binary? im trying to chroot an android phone :)

benibela commented 7 years ago

Yes, I uploaded one

zpimp commented 7 years ago

thank you

zpimp commented 7 years ago

why is the android binary size so big (3x)?