deutsche-nationalbibliothek / pica-rs

Tools to work with bibliographic records encoded in PICA+.
https://deutsche-nationalbibliothek.github.io/pica-rs/
European Union Public License 1.2
31 stars 5 forks source link

Allow multiline filter expression with comments #457

Open nichtich opened 2 years ago

nichtich commented 2 years ago

I'd like to filter records by multiple conditions, and document the filter expression with comments. For this it would be useful to allow line breaks as whitespace in filter expressions and to allow comments (e.g. from # to the end of line). An example:

002@.0 !~ '^a' &&   # kein Mailbox-Satz
!024O?              # nicht gelöscht

Given this feature, filters can be collected in files:

pica filter "$(< filter.txt )" sample.dat
nwagner84 commented 2 years ago

Thanks for your request!

It's already possible to get a (complex) filter expression from a file (use -f or --file parameter), but a "dummy" filter expression is still necessary:

$ pica filter -f filter.txt "003@?" sample.dat

The file content can contain whitespace-, tab- or newline-characters, but no comments.

I'll implement the comment part soon!

nichtich commented 2 years ago

but a "dummy" filter expression is still necessary

That's confusing. It it possible to set a default filter matching any field in this case?

nwagner84 commented 2 years ago

Actually, the filter expression from the CLI interface is overwritten by the content of the file, but it's not possible to omit the "dummy" cli argument.

This is indeed confusing and I'll try to fix this issue soon.

nichtich commented 1 year ago

Current workaround is to use $(sed 's/#.*//;s/ \+/ /g;' filter.txt | tr -d '\n') as filter argument.

nwagner84 commented 1 year ago

Just a short note about the workaround: I think this doesn't work with expressions containing values with an # and the cardinality operator:

...
#012A/*{ #0 > 0 && z in ["abc", "def", "x#y"] } >= 10 && # my comment
...
nichtich commented 1 year ago

The complexity of this issue seems low but I cannot tell. If so, I'd welcome it included in #505 or one of the next releases.

nwagner84 commented 1 year ago

Yesterday, I started with the implementation. Will be included in the upcoming release 0.17.0.

nwagner84 commented 1 year ago

Sorry, this is more challenging as expected. I started working on this feature, but I must move this to v0.18.0.