Open JanOdijk opened 1 year ago
and I also encountered NAAM3. (with a period at the end, which occurs at the end of an utterance. Is this allowed?
- VOORNAAM: (this should be added to the category "person")
This is already a valid code, <prefix>NAAM
. It is possible you encountered these without replacements in older versions of SASTA. A bug existed that didn't anonymise CHAT input, only Word input.
See for test example utterances and their expected replacement: https://github.com/UUDigitalHumanitieslab/sasta/blob/adb553325b41ea379fee8133f74b7e21797eda42/backend/analysis/convert/tests/conftest.py#L76-L134
- Lower case variants: (are they allowed?)
No. This could lead to incorrect replacements: Mijn voornaam is Piet
-> Mijn Jan is Piet
- NAAMOVERIG: (new category, should be added)
This is already a valid code: NAAM<suffix>
. Same explanation as VOORNAAM
.
- A pseudonym with counter 5 (is this allowed?)
Not currently, easy to implement though.
In category "profession" the common value "chirurgh" should be replaced by "chirurg"
Good catch
Thanks. I did not read the documentation well enough. How do you prevent that ACHTERNAAM is analysed as with prefix ACHTER and CODE NAAM? You first search for the longest CODE in a pseudonym?
Indeed, longest -> shortest is checked
I encounter the following other "pseudonyms" (with their frequencies) in the reference data:
VOORNAAM: (this should be added to the category "person")
Lower case variants: (are they allowed?)
NAAMOVERIG: (new category, should be added)
A pseudonym with counter 5 (is this allowed?)
In category "profession" the common value "chirurgh" should be replaced by "chirurg"