benibela / xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
http://www.videlibri.de/xidel.html
GNU General Public License v3.0
674 stars 42 forks source link

fn:collection is not working #66

Closed easz closed 3 years ago

easz commented 3 years ago

I try to process all XMLs under a folder with fn:collection

for $f in collection('file:///path/to/my/folder')

and got err:FODC0002: No collection entry for file:///path/to/my/folder

am I doing wrong or is it a bug? (on macos)

At the moment I would separately use shell script to generate a file list and use fn:doc to process each of them.

benibela commented 3 years ago

You can use file:list to get a list of files

Or call it with xidel //path/to/my/folder/*.xml -e ...

fn:collection is not implemented, except to return that error message. Nothing in the standard says how it should handle with file urls

mikekuznetsov11 commented 3 years ago

I am trying to feed multiple files into xidel (Windows 10 cmd) and cannot achieve that. Tried many variants, including

xidel file://c:/temp/*.xml
xidel --data=c:\temp\*.xml 
xidel --xquery "collection('c:\temp\*.xml')"

Could you give a code example for Windows?

Reino17 commented 3 years ago

This won't work with xidel on Windows. Filename globbing only works for cmd's internal commands (see https://ss64.com/nt/syntax-wildcards.html). So for instance:

FOR %A IN ("C:\temp\*.xml") DO @ECHO %A

or

DIR /B "C:\temp\*.xml"

Without resorting to hacks the output will always be on multiple lines. You're better off doing this with xidel "in-query". Something like:

xidel -se "for $x in file:list('C:\temp',false(),'*.xml') return doc($x)"

P.s. This issue is closed. And because you're question has nothing to do with collection(), it would've been better to open a new "issue", or better yet, ask your question on the mailinglist.

mikekuznetsov11 commented 3 years ago

Thanks a lot for the workaround. Though I think it has certain limitations. E.g. if I need to perform aggregate functions (sum, count), they will be performed grouped by input file, instead of aggregating throughout the whole data set represented by the files. Am I right?

Reino17 commented 3 years ago

Without having seen the content of your input-files and the expected output there's no way to tell.
But again, this issue is closed. Please start a new one (if Benito doesn't mind questions here), or post on the mailinglist or even StackOverflow otherwise.