benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
674 stars 42 forks source link

Latest build is not outputing the expression #105

Closed mailsanchu closed 1 year ago

mailsanchu commented 1 year ago

xidel "https://github.com/Netflix/hollow/releases/latest" -e "/html/body/div[4]/div/main/div[2]/div/div[2]/div[1]/div/div[2]"

This command is not printing any result.

Xidel 0.9.9 (20230627.git9905cb0bb10a9c96906cf84b86da7898eeb4c649) Compiled with FPC3.2.2 x86_64-Linux release C+

https://www.videlibri.de/xidel.html by Benito van der Zander

Reino17 commented 1 year ago

The node you're querying doesn't exist in the HTML-source of that url, so obviously it'll return nothing.

mailsanchu commented 1 year ago

How do i get the latest version from this page https://github.com/Netflix/hollow/releases

mailsanchu commented 1 year ago

I am expecting to get this value as of now v7.5.9-rc.5

Reino17 commented 1 year ago
xidel -s "https://github.com/Netflix/hollow/releases" \
      -e '//div[@data-hpc]/section[1]/div/div[2]/div/div/div/div/div/span/a/resolve-uri(@href)'

xidel -s "https://github.com/Netflix/hollow/releases" \
      -e '(//a[@class="Link--primary"])[1]/resolve-uri(@href)'
xidel -s "https://github.com/Netflix/hollow/releases/latest" \
      -e '$url'

xidel -s "https://github.com/Netflix/hollow/releases/latest" \
      -e '//meta[@property="og:url"]/resolve-uri(@content)'

xidel -s "https://github.com/Netflix/hollow/releases/latest" \
      -e '//ol/li[2]/a/resolve-uri(@href)'

Better yet... use the API:

xidel -s "https://api.github.com/repos/Netflix/hollow/releases" \
      -e '$json(1)/html_url'

xidel -s "https://api.github.com/repos/Netflix/hollow/releases/latest" \
      -e '$json/html_url'

Relevant SO posts:

P.s. Since you're on Linux, please use single-quotes --> -e 'function("string")'.

mailsanchu commented 1 year ago

The above works. Will it not work if I copy the full xpath from chrome?

mailsanchu commented 1 year ago

/html/body/div[1]/div[6]/div/main/turbo-frame/div/div/div/div/div[1]/div[1]/div[1]/div[1]/h1 is the one I copied from chrome

benibela commented 1 year ago

Chrome runs javascript that changes the page. You would need to disable javascript there to get a usable xpath

mailsanchu commented 1 year ago

I need to run 0.9.8 and see if it works. It used to work before.

Reino17 commented 1 year ago

Without javascript xidel renders the Github page differently. Use the following command to see how (with proper indentation):

xidel -s "https://github.com/Netflix/hollow/releases" -e . --output-node-format=xml --output-node-indent

For xidel that would be:

xidel -s "https://github.com/Netflix/hollow/releases" -e '
  path(//div[@data-hpc]/section[1]/h2) ! replace(.,"Q\{\}","")
'
/html[1]/body[1]/div[1]/div[4]/div[1]/main[1]/turbo-frame[1]/div[1]/div[1]/div[3]/section[1]/h2[1]

Or a little bit simplified:

xidel -s "https://github.com/Netflix/hollow/releases" -e '
  /html/body/div[1]/div[4]/div/main/turbo-frame/div/div/div[3]/section[1]/h2
'
v7.5.9-rc.5
mailsanchu commented 1 year ago

It must be github change and my version upgrade just a coincidence. I am happy to close this ticket