This Thunderbird extension facilitates handling of redundant entries in address books.
After installation it can be invoked via the Tools->Duplicate Contacts Manager...
menu entry.
One can also customize the Toolbar
of the Address Book
window with a Find Duplicates
button.
The Duplicate Contacts Manager searches address books for matching contact entries, also known as cards. It can automatically delete all cards that match and have equivalent or less information than some other one. Any remaining pairs of matching cards are presented as potential duplicates for manual treatment. Each two cards are shown side-by-side with a comparison of all fields containing data, including any photo. Some important fields are always shown such that they can be filled in when they have been empty so far.
When pairs of candidate duplicates are presented, the reason why they are considered matching is given in the status line.
During manual treatment of a pair of matching cards the user can skip them, can modify one or both of them, and can decide to delete one of them. When a card is deleted and it has a primary email address that is contained in one or more mailing lists and the other card does not have the same primary email address, the address is also deleted from the respective mailing lists.
There are two search modes for finding matching cards:
Two cards are considered matching if any of the following conditions hold, where the details are explained below.
Yet cards with non-equivalent AIMScreenName
are never considered matching, which is convenient for preventing cards from being repeatedly presented for manual treatment.
The matching relation is designed to be rather weak, such that it tends to yield more pairs of candidate duplicates.
Matching of names, email addresses, and phone numbers is based upon equivalence of fields modulo abstraction, described below.
As a result, for example, names differing only in letter case are considered to match.
For the matching process, names are completed and their order is normalized —
for example, if two name parts are detected in the DisplayName
(e.g., "John Doe") or in an email address (e.g., "John.Doe@
company.com"), they are taken as first and last name.
Both multiple email addresses within a card and multiple phone numbers within a card are treated as sets, i.e., their order is ignored as well as their types.
DisplayName
is not empty and is equivalent, orFirstName
and their LastName
are not empty and are pairwise equivalent, orDisplayName
is empty but their FirstName
and LastName
are not empty and are pairwise equivalent, orDisplayName
is empty and either the FirstName
or LastName
is not empty and is equivalent to the DisplayName
of the other card, orAIMScreenName
is not empty and is equivalent.PrimaryEmail
or SecondEmail
are equivalent.CellularNumber
, WorkPhone
, or PagerNumber
are equivalent.
The HomePhone
and FaxNumber
fields are not considered for matching because such numbers are often shared by a group of people.Before card fields are compared their values are abstracted using the following steps.
UID, UUID, CardUID, groupDavKey, groupDavVersion, groupDavVersionPrev, RecordKey, DbRowID, PhotoType, PhotoName, LowercasePrimaryEmail, LowercaseSecondEmail, unprocessed:rev, unprocessed:x-ablabel
,@googlemail.com
by @gmail.com
in email addresses.Corresponding fields in two cards are considered equivalent if their abstracted values are equal. Note that the value adaptations mentioned above are computed only for the comparison, i.e., they do not change the actual card fields.
If automatic removal is chosen, only cards preferred for deletion (which implies equivalent or less information than some other card; for details see below) are removed. When a pair of matching cards is presented for manual inspection, the card flagged by default with red color for removal is
PopularityIndex
, or elseLastModifiedDate
), or elseA card is considered to have equivalent or less information than another if for each non-ignored field:
FirstName
, LastName
, or DisplayName
and its value is a substring of the corresponding field value of the other card, orPopularityIndex
or LastModifiedDate
(which are ignored here), or0
for number fields or false
for Boolean fields.For the above field-wise comparison, the email addresses of a card are treated as a set, the phone numbers of a card are also treated as a set, and the set of names of mailing lists a card belongs to is taken as an additional field.
A card with equivalent or less information than another is preferred for deletion if:
PopularityIndex
is smaller, or elseLastModifiedDate
is smaller.Here is an example.
The card on the right will be preferred for deletion because it contains less information.
NickName |
"Péte" | " pete ! " | accent, punctuation, letter case, and whitespace ignored |
FirstName |
"Peter" | "Peter Y van" | name prefix "van" moved to last name, middle initial "Y" ignored |
LastName |
"van Müller" | "Mueller" | name prefix "van" moved to last name, umlauts transcribed |
DisplayName |
"Hans Peter van Müller" | "van Müller, Peter" | first name moved to the front, name is substring |
PreferDisplayName |
'yes' | 'yes' | same truth value |
AimScreenName |
"" | "" | same AIM name |
PreferMailFormat |
'HTML' | 'unknown' | default ('unknown') considered less information |
PrimaryEmail |
"Peter.vanMueller@company.com" | "P.van.Mueller@gmx.de" | emails treated as sets, letter case ignored |
SecondaryEmail |
"p.van.mueller@gmx.de" | "" | emails treated as sets, letter case ignored |
WorkPhone |
"089/1234-5678" | "+49 89 12345678" | national prefix normalized and non-digits ignored |
PopularityIndex |
5 | 3 | field ignored for information comparison |
LastModifiedDate |
2018-02-25 07:51:28 | 2018-02-25 08:30:37 | field ignored for information comparison |
UUID |
"" | "903a61be-64d5-4844-802a" | field ignored |
The options/configuration/preferences used by this Thunderbird extension are are saved in configuration keys starting with extensions.DuplicateContactsManager.
—
for instance, the list of ignored fields is stored in the variable ignoreFields
.