ePADD / epadd

ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
https://www.epaddproject.org
112 stars 24 forks source link

‘Undisclosed-recipients:’ appears in the list of correspondents. #416

Open jfarwer opened 3 years ago

jfarwer commented 3 years ago

Sometimes Mbox emails contain:

To: "Undisclosed recipients:"

ePADD maintains a list of words in a file named 'bannedStringsInPeopleNames.txt' which contains the words ‘undisclosed’ and 'recipient' in order to avoid 'Undisclosed recipients:' getting recognised as a person’s name. Maybe this doesn't prevent the name from appearing in the correspondent list because ‘Undisclosed recipients’ is used as email address in ePADD for which 'bannedStringsInPeopleNames.txt' is not used?

tomhigginsuom commented 10 months ago

Can we filter the addresses to check they are real addresses before adding to the correspondent list (e.g. regex)?