benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
674 stars 42 forks source link

Is this documented somewhere? #104

Open DaVyper opened 1 year ago

DaVyper commented 1 year ago

Also: Is there a way to retry requests, when they timeout?

--error-handling=xx=retry

for an error code xx. Or xx for all 2 digit codes

Originally posted by @benibela in https://github.com/benibela/xidel/issues/9#issuecomment-257084882

How can i get it to retry on -3 timeout, and why isn't xxx retrying on 504 errors (gateway timeout)

i've tried quoted and not: --error-handling=""=retry --error-handling="-"=retry --error-handling="x"=retry --error-handling="-x"=retry --error-handling="xx"=retry --error-handling="-xx"=retry --error-handling="xxx"=retry --error-handling="-xxx"=retry --error-handling="xxxx"=retry --error-handling="-xxxx"=retry --error-handling=retry and --error-handling==retry --error-handling=-=retry --error-handling=x=retry --error-handling=-x=retry --error-handling=xx=retry --error-handling=-xx=retry --error-handling=xxx=retry --error-handling=-xxx=retry --error-handling=xxxx=retry --error-handling=-xxxx=retry

nothing seems to get it to retry... and everytime it's run it will again fail with a timeout randomly in the list of 1000's of sites via -f, but is successful other times (its a timeout simply retrying fixes it nearly everytime unless the server is down)

Error:
Internet/HTTP Error: 504
when talking to: https:/<snip>/blah.json
Internet Error: -3 Connection timed out
when talking to: https://<snip>/blah.json
benibela commented 1 year ago

It is documented in --help

But looks like it has been broken for 2 years

DaVyper commented 1 year ago

Thanks for looking into it, when/where can I get a compiled windows binary? (not a programmer in this language) I assume it'll eventually be on SF?

Reino17 commented 1 year ago

Eventually. But for now you could grab the latest nightly. I can see they're made just shortly after the latest commit, so they're probably up-to-date.

Benito, it would've been nice to see the creation-date on the nightly website, instead of having to download one to check.

DaVyper commented 1 year ago

must not be there yet as even implicitly using '--error-handling="-3"=retry' and '--error-handling="504"=retry' it still eventually times out and exits the process as it did before

Reino17 commented 1 year ago
xidel --help
[...]
  --error-handling=<string>             How to handle http errors, e.g.
                                        1xx=retry,200=accept,3xx=redirect,4xx=abort,5xx=skip

So xidel -s --error-handling=504=retry "<url>" -e "..." should work. Or even --error-handling="504=retry".

benibela commented 1 year ago

Perhaps there is another issue

Try it with the example domain:

 $  xidel http://example.org/xyz -e //title
 Error:
 Internet/HTTP Error: 404 Not Found
 when talking to: http://example.org/xyz
 $  xidel --error-handling 404=accept http://example.org/xyz -e //title
 Example Domain
 $  xidel --error-handling xxx=accept http://example.org/xyz -e //title
 Example Domain
 $  xidel --error-handling xx=accept http://example.org/xyz -e //title
 Error:
 Internet/HTTP Error: 404 Not Found
 when talking to: http://example.org/xyz
 $  xidel --error-handling 404=retry http://example.org/xyz -e //title

(last one never finishes, since it waits for the 404 error to go away, which is not happening)

using '--error-handling="-3"=retry' and '--error-handling="504"=retry'

you can only use it once

Benito, it would've been nice to see the creation-date on the nightly website, instead of having to download one to check.

It is not my website. You can ask here about it : https://github.com/oprypin/nightly.link/issues

DaVyper commented 1 year ago

you can only use it once

so i'm screwed if it has more than one error i.e. i can get past for ex the -3 but then if it gets a 504 it will fail since i can't tell it to retry on both

benibela commented 1 year ago

then you use both in one

--error-handling xx=retry,504=retry