benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
681 stars 42 forks source link

User Guide #10

Closed dhammikab closed 6 years ago

dhammikab commented 7 years ago

Apart from the readme file that provides some insight as to what can be done with the tool, there does not seem to be user guide.

e.g. What does the extract function do ? What are the parameters to this function meant to be ?

benibela commented 7 years ago

What kind of user guide?

Most of it is standard XQuery. There are entire books about that already. Although these functions like extract remain from a time, when I did not care about the standard

e.g. What does the extract function do ?

Is that not clear? “This applies the regex "regex" to "string" and returns only the matching part.”

It returns pretty much what the grep program would return.

What are the parameters to this function meant to be ?

Pretty much the same as the matches/replace function

dhammikab commented 7 years ago

Your library looks great, but I feel to unlock it's true potential it needs some minimum documentaion.I would love to contribute some time to this project and publish something so others like me can unlock it's potential.Let me know if you are up to this and we can start collaborating on a "startup tutorial" . Dhammika Sent from my BlackBerry - the most secure mobile device - via the Fido Network   Original Message   Show Details From: notifications@github.comSent: November 28, 2016 6:18 PMTo: xidel@noreply.github.comReply-to: reply@reply.github.comCc: github@dhammika.net; author@noreply.github.comSubject: Re: [benibela/xidel] User Guide (#10) What kind of user guide? Most of it is standard XQuery. There are entire books about that already. Although these functions like extract remain from a time, when I did not care about the standard

e.g. What does the extract function do ?

Is that not clear? “This applies the regex "regex" to "string" and returns only the matching part.” It returns pretty much what the grep program would return.

What are the parameters to this function meant to be ?

Pretty much the same as the matches/replace function

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.

benibela commented 7 years ago

What would you write about? Some specific example use cases might be interesting. A tutorial for HTML would be much different than one for JSON processing.

dhammikab commented 7 years ago

My thoughts were to first to take on HTML as that would be more used. The audience for JSON is somewhat technical and we can take that up at a later time.

I would like to to take a practical scenario such as extracting some info from an online phone directory (e.g. yellow pages) or some publicly available data source e.g. https://www.sbcncanada.org/directory?page=1 and show how the tool can be used to extract the data and perhaps process it to generate an CVS file for import.

May be as we start with something and work on it it will end up in a practical how to...perhaps how one would login to a secure site, pagination, formatting, etc. etc.

Thoughts ?

benibela commented 7 years ago

Well, I cannot write much text atm. I am slow at writing natural text and there are many technical things to improve. A long road from an XQuery 3.0 implementation to XQuery 3.1.

It would not be hard to extract the data from that page. A few short queries. Is it a popular page that people care about?

benibela commented 6 years ago

seems we are not getting anywhere