other pseudonyms - Githubissues

JanOdijk commented 1 year ago

I encounter the following other "pseudonyms" (with their frequencies) in the reference data:

VOORNAAM: (this should be added to the category "person")
- VOORNAAM1 5
- VOORNAAM2 5
- voornaam1 1
- voornaam2 1
- VOORNAAM3 6
- VOORNAAM4 2
Lower case variants: (are they allowed?)
- plaatsnaam1 3
- voornaam1 1
- voornaam2 1
NAAMOVERIG: (new category, should be added)
- NAAMOVERIG1 6
- NAAMOVERIG2 4
- NAAMOVERIG3 2
A pseudonym with counter 5 (is this allowed?)
- NAAM5 3

In category "profession" the common value "chirurgh" should be replaced by "chirurg"

JanOdijk commented 1 year ago

and I also encountered NAAM3. (with a period at the end, which occurs at the end of an utterance. Is this allowed?

JeltevanBoheemen commented 1 year ago

VOORNAAM: (this should be added to the category "person")

This is already a valid code, <prefix>NAAM. It is possible you encountered these without replacements in older versions of SASTA. A bug existed that didn't anonymise CHAT input, only Word input. See for test example utterances and their expected replacement: https://github.com/UUDigitalHumanitieslab/sasta/blob/adb553325b41ea379fee8133f74b7e21797eda42/backend/analysis/convert/tests/conftest.py#L76-L134

Lower case variants: (are they allowed?)

No. This could lead to incorrect replacements: Mijn voornaam is Piet -> Mijn Jan is Piet

NAAMOVERIG: (new category, should be added)

This is already a valid code: NAAM<suffix>. Same explanation as VOORNAAM.

A pseudonym with counter 5 (is this allowed?)

Not currently, easy to implement though.

In category "profession" the common value "chirurgh" should be replaced by "chirurg"

Good catch

JanOdijk commented 1 year ago

Thanks. I did not read the documentation well enough. How do you prevent that ACHTERNAAM is analysed as with prefix ACHTER and CODE NAAM? You first search for the longest CODE in a pseudonym?

JeltevanBoheemen commented 1 year ago

Indeed, longest -> shortest is checked

UUDigitalHumanitieslab / sasta

other pseudonyms #153