benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
674 stars 42 forks source link

Add header + footer in batch script? #103

Closed Shohreh closed 1 year ago

Shohreh commented 1 year ago

Hello,

Can Xidel add a header and footer, as a smarter way than this?

@echo off

SET OUTPUT="joined.gpx"

echo ^<?xml version="1.0" encoding="UTF-8"?^> > %OUTPUT%
echo ^<gpx^>^<trk^>^<trkseg^> >> %OUTPUT%
for %%f in ("*.gpx") do (
xidel.exe "%%f" -se "//trk/trkseg/trkpt" --printed-node-format xml >> %OUTPUT%
)
echo ^</trkseg^>^</trk^>^</gpx^> >> %OUTPUT%

Thank you.

Reino17 commented 1 year ago

Hello Shohreh,

Yes, it can. In addition, I would also suggest you use the integrated EXPath File Module to directly list, load and parse the local gpx-files within xidel.

I didn't know gpx-files, but I assume you mean files like this one.

With "direct element constructors":

xidel -se "(<gpx><trk><trkseg>{for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}</trkseg></trk></gpx>)" --output-format=xml --output-node-indent --ignore-namespaces --output-declaration="<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
xidel -se ^"^
  (^<gpx^>^<trk^>^<trkseg^>{^
    for $file in file:list(.,false(),'*.gpx') return^
    doc($file)//trk/trkseg/trkpt^
  }^</trkseg^>^</trk^>^</gpx^>)^
" --output-format=xml --output-node-indent --ignore-namespaces^
  --output-declaration="<?xml version=\"1.0\" encoding=\"UTF-8\"?>"

(if you get Error Unknown option: output-node-indent, then please update!)

With "computed constructors":

xidel -se "element gpx {element trk {element trkseg {for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}}}" --output-format=xml --output-node-indent --ignore-namespaces --output-declaration="<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
xidel -se ^"^
  element gpx {^
    element trk {^
      element trkseg {^
        for $file in file:list(.,false(),'*.gpx') return^
        doc($file)//trk/trkseg/trkpt^
      }^
    }^
  }^
" --output-format=xml --output-node-indent --ignore-namespaces^
  --output-declaration="<?xml version=\"1.0\" encoding=\"UTF-8\"?>"

Without --output-declaration your XML-declaration will probably look like one of these two...

<?xml version="1.0" encoding="ibm437"?>
<?xml version="1.0" encoding="ibm850"?>

...so with the custom XML-declaration you can force it to "UTF-8". This however is only on screen! If you save the output to a file, then --output-declaration isn't needed, because the output will always be UTF-8.

xidel -se "[...]" --output-format=xml --output-node-indent --ignore-namespaces > joined.gpx

Alternatively you can also use file:write():

xidel -se "file:write('joined.gpx',element gpx {element trk {element trkseg {for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}}},{'indent':true(),'omit-xml-declaration':false()})" --ignore-namespaces
xidel -se ^"^
  file:write(^
    'joined.gpx',^
    element gpx {^
      element trk {^
        element trkseg {^
          for $file in file:list(.,false(),'*.gpx') return^
          doc($file)//trk/trkseg/trkpt^
        }^
      }^
    },^
    {'indent':true(),'omit-xml-declaration':false()}^
  )^
" --ignore-namespaces

@benibela How about serialize([...],{QName('x:ignore-namespaces'):true()}) / file:write([...],[...],{'x:ignore-namespaces':true()})?

Shohreh commented 1 year ago

Thanks very much!

However, even after upgrading to the latest (0.9.8), the first two commands return "Error Unknown option: output-node-indent (when reading argument: output-node-indent)" :

c:\Apps\xidel\xidel.exe -se "(<gpx><trk><trkseg>{for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}</trkseg></trk></gpx>)" --output-format=xml --output-node-indent --ignore-namespaces --output-declaration="<?xml version=\"1.0\" encoding=\"UTF-8\"?>

And the third one returns "err:XPST0003: This language feature is not available in the selected language. XQuery is required to use constructors" :

c:\Apps\xidel\xidel -se "file:write('joined.gpx',element gpx {element trk {element trkseg {for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}}},{'indent':true(),'omit-xml-declaration':false()})" --ignore-namespaces

err:XPST0003: This language feature is not available in the selected language. XQuery is required to use constructors in: file:write('joined.gpx',element  [<- error occurs before here] gpx {element trk {element trkseg {for $file in file:list(.,fals e(),'*.gpx') return doc($file)//trk/trkseg/trkpt}}},{'indent':true(),'omit-xml-declaration':false()})
Reino17 commented 1 year ago

I already mentioned this in my post. Please check the url again.

Shohreh commented 1 year ago

Sorry, I only checked the main site (0.9.8), not the development section on Sourceforge (0.9.9).

Is there a way to prevent xidel from displaying the output in the terminal? With big files, it takes a long time.

SOLVED Also, I find no new file in the directory (output just flies down the screen), and using redirection doesn't work: (edit : Use --download=output.gpx)

Reino17 commented 1 year ago

I highly doubt using --download really solved your problem, because --download can only be used when either of...

xidel --help
[...]
--input=<string>                        Data/URL/File/Stdin(-) to process (--input= prefix can be
                                        omitted)
[...]
  --follow=<string>  or -f              Expression selecting data from the page which will be
                                        followed.

...is specified, which is not the case here.

Shohreh commented 1 year ago

Indeed, it created a file that only contained… "<empty/>"

Is there no way to save the output into a new file, since "> output.gpx" triggers an error?

EFOpenError: Unable to open file "c:\XML\test.gpx": The process cannot access the file because it is being used by another process.

Reino17 commented 1 year ago

This is probably outside the scope of xidel, because, as it says, another process is already claiming access. Close them (or restart your pc if that doesn't help?). Then try again.

Shohreh commented 1 year ago

I delete the output file before running the app.

The error only occurs when directing the output to a file, not when run as-is (ie. output to the screen).

If the output can't be saved into a file, what's the point?

Reino17 commented 1 year ago

Care to share this input file so I can have a look? Also please tell me the exact command / query you've tried.

Shohreh commented 1 year ago

On Windows, in a batch file, I tried the three commands mentioned above. They all fail when redirecting the output to a file instead of the screen.

@echo off
c:\Apps\xidel.exe --color=never -se "(<gpx><trk><trkseg>{for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}</trkseg></trk></gpx>)" --output-format=xml --output-node-indent --ignore-namespaces --output-declaration="<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
REM c:\Apps\xidel.exe --color=never -se "element gpx {element trk {element trkseg {for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}}}" --output-format=xml --output-node-indent --ignore-namespaces --output-declaration="<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
REM c:\Apps\xidel.exe --color=never -se "file:write('joined.gpx',element gpx {element trk {element trkseg {for $file in file:list(.,false(),'*.gpx') return doc($file)//trk/trkseg/trkpt}}},{'indent':true(),'omit-xml-declaration':false()})" --ignore-namespaces

Just create a couple of GPX files and try to join their points (trkpt) to get a single file:

<?xml version="1.0" encoding="UTF-8">
<gpx>
  <trk>
    <trkseg>
      <trkpt lat="44.14225" lon="1.97067">
        <ele>154</ele>
      </trkpt>
      <trkpt lat="44.14225" lon="1.97067">
        <ele>154</ele>
      </trkpt>
    </trkseg>
  </trk>
</gpx>
benibela commented 1 year ago

If you do > output.gpx, it will find output.gpx in file:list(.,false(),'*.gpx') and doc($file) will try to open it, which it cannot because it is used for output.

You could do `> output.xxx

@benibela How about serialize([...],{QName('x:ignore-namespaces'):true()}) / file:write([...],[...],{'x:ignore-namespaces':true()})?

But ignore-namespaces is an input option, not an output option

Shohreh commented 1 year ago

Good idea :-)

xidel.join.cmd > join.gpx.tmp
mv join.gpx.tmp join.gpx

"But ignore-namespaces is an input option, not an output option" : I just copy/pasted the instructions above.

Thanks!

Reino17 commented 1 year ago

So, a workflow error in the end. I guess it's all settled now.

@benibela Hah! The documentation really needs a big overhaul!
At the moment xidel --help shows it's a "XPath/XQuery compatibility" option, but a better place would probably be the "Output/Input options". And speaking of which, I'd say split this into "Output options" and "Input options" so there will no more discussion whether something is an input- or an output option.

benibela commented 1 year ago

And speaking of which, I'd say split this into "Output options" and "Input options" so there will no more discussion whether something is an input- or an output option.

But there are also options that are both like --xml

Reino17 commented 1 year ago

Then I'd say specify them twice:

Input options:
--xml   ...
Output options:
--xml   ...

Or just create 3 categories:

Input options:
[...]
Output options:
[...]
Input & Output options:
--xml   ...
benibela commented 1 year ago

@Reino17

Then I'd say specify them twice:

Input options:
--xml   ...
Output options:
--xml   ...

I am not sure if the command line parser can handle that

Reino17 commented 1 year ago

What do you mean? --xml is already an input- and output option, which afaik the command-line parser handles just fine. So this just a matter of documentation.

benibela commented 1 year ago

@Reino17

The documentation is automatically generated by the command-line parser