Closed sebbASF closed 2 years ago
The full generator creates an id based on the exact mail source, so loading the same email from two different archives will generate two different Permalinks.
The corresponding DKIM ids will be the same, but the new source ids will be different.
Currently migrate.py drops one of the original Permalinks and creates an orphan source entry.
This is covered by #193
https://github.com/apache/incubator-ponymail-foal/blob/03e100e6f012f1ef335e0a2e16bb832d70a50801/tools/migrate.py#L158
The bulk update code currently uses an "insert" operation to load the data. This silently replaces any existing entry, which can result in lost Permalinks.
In the case of a conversion without generating new DKIM keys, mbox entries retain the same id, so the operation should always generate a new entry, i.e. "create" is the appropriate operation. If there is an existing entry, this is a fatal error.
In the case of conversion to DKIM, it's vital to ensure that all Permalinks that point to the same message are carried forward. If there is already an mbox entry with the same DKIM id, then it must be updated with the Permalinks from the current entry.
In this case using "create" will detect duplicates, and the existing entry must be updated as necessary. For additional security, the code should check that other essential fields (Subject, To, etc) are the same. [I think discrepancies can only occur here if a field does not correspond with the source record, but this needs to be investigated]
I think the code may also need to allow for multiple source entries with the same DKIM id, i.e. there may need to be more than 1 dbid entry. [This needs to be confirmed]
The source entry will get always a new id. This may legitimately be the same as an existing entry (with the old generators, multiple Permalinks could be created for the same email), so the operation should be "index" for the source entries.