UAlbanyArchives / mailbag-specification

A specification for packaging email in multiple formats
MIT License
0 stars 0 forks source link

Where to store metadata related to the email account and/or folder #1

Closed stuartyeates closed 2 years ago

stuartyeates commented 3 years ago

The standard doesn't seem to make clear where to store metadata related to the email account and/or folder that may be important to understanding the contents. For example:

  1. The history of name changes of the account
  2. The history of mail aliases of the account
  3. The history of mail forwarding of the account
  4. The history of users of the account (for role-based accounts)
  5. ...

It's probably outside of the scope of the document to specify a format for this metadata, but it should acknowledge the existence of the metadata and describe where it should be stored / linked to.

jamiepb commented 3 years ago

I have the same question. Some examples are 1) when exporting an email account through eDiscovery in Microsoft Office 365, the export includes some csv and xml files about the export and the results. 2) when converting a pst file to eml, the TOMES pst extractor tool (https://github.com/StateArchivesOfNorthCarolina/tomes-pst-extractor) includes a tsv file mapping folder names and is essential to understanding the conversion results. While it's rightly outside the mailbag specification to lay out such required metadata and the format, is it acceptable to include such files in the format subdirectories?

gwiedeman commented 3 years ago

Cross-linking to Google Doc comment which mentions this topic: https://docs.google.com/document/d/1X7pOHxxzZl6PyMAJWd7bIR11rE4FlKty3J7oI6ghAKo/edit?disco=AAAANEF3VwA

gwiedeman commented 3 years ago

Thank you @stuartyeates and @jamiepb , we haven't discussed this and these use cases are extremely helpful!

gwiedeman commented 2 years ago

With input from the Advisory Board, we are thinking of just adding a recommendation to include metadata like @stuartyeates describes in bag-info.txt. We want to be flexible and permissive with metadata, and allowing multiple accounts in 0.3 makes this more challenging to provide a generalizable recommendation. In some cases it may also be useful to manage this information outside of a mailbag per UAlbanyArchives/mailbag-specification#6. I could see some example keys being useful to encourage consistency, but I'm hesitant to be more prescriptive than that. If there are any suggestions, feel free to add them here or in the doc.

gwiedeman commented 2 years ago

Per @jamiepb 's comment we decided to describe these examples as "companion files" and make it clear in the spec that its permissible to keep them in the source format subdirectory. So if you have TSVs or similar for PSTs, they would go in mailbag_root/data/pst using the same original arrangement. Its unlikely that these would be machine-discoverable outside of local practices, but this way you can just throw them in there if that meets your needs or define a local structure for them and possibly document relationships in bag-info.txt or another tag file if necessary.

Currently, the mailbagit implementation only packages files with the matching extension. So we're planning to add an optional arg to include all files in a provided directory. Let us know it that doesn't address your concerns!

gwiedeman commented 2 years ago

Addressed by #9