Closed baudbaudy closed 3 months ago
You're being bitten by a feature in the simple, extended and advanced search unfortunately. Everything entered is treated verbatim, except for these 3 special cases:
or
*
is zero or more characters
?
is any one character
More specifically, what you enter converted to regex, and these 3 are substituted in the following way:
|
-> |
(left alone)*
-> .*
?
-> .
Unfortunately, you can't bypass this at the moment, so to find the |
literally, you'll have to use the expert view and enter the regex yourself.
For your example that would look like this:
[lemma="LE\|À"]
(note the escaping backslash \
before the pipe |
).
Sidebar: BlackLab supports multiple values, so what you could also do is index both the full value and the individual values for the lemma and pos. The token will then match for any of the values.
You could do this as follows:
annotatedFields:
contents:
annotations:
- name: lemma
displayName: Lemma
valuePath: "@lemma"
multipleValues: true
allowDuplicateValues : false
process:
- action: split
separator: "\\|"
keep: both
- name: pos
displayName: Part of Speech
valuePath: "@pos"
multipleValues: true
allowDuplicateValues : false
process:
- action: split
separator: "\\|"
keep: both
The split
process option is explained here:
https://inl.github.io/BlackLab/guide/how-to-configure-indexing.html#processing-values
There is a caveat though:
There's 3 values for lemma
(['LE|À', 'LE', 'À']
), but only the first value on any token can be shown in the UI. That is also what is used when sorting or grouping the results (for example, grouping on lemma
would put your example word in the LE|À
group only, not in the group for LE
or À
.
Very good, thank you for your response and advice.
Hi there! First of all thank you for your work on BlackLab. I have a problem and I haven't found a solution in the documentation or issues on github. I work on French letters and in our TEI files some of our words can have several lemmas (example:
<w lemma="LE|À" pos="art. def.|prép.">au</w>
).In the corpus-frontend, searching by word works very well, however searching for the lemma
"LE|À"
does not provide any results and searching only for the lemma"LE"
or"À"
does not find the word"au"
. Do you have any solutions to suggest to me to resolve this problem?Thank you for your time and assistance.