Closed stuartyeates closed 2 years ago
I have the same question. Some examples are 1) when exporting an email account through eDiscovery in Microsoft Office 365, the export includes some csv and xml files about the export and the results. 2) when converting a pst file to eml, the TOMES pst extractor tool (https://github.com/StateArchivesOfNorthCarolina/tomes-pst-extractor) includes a tsv file mapping folder names and is essential to understanding the conversion results. While it's rightly outside the mailbag specification to lay out such required metadata and the format, is it acceptable to include such files in the format subdirectories?
Cross-linking to Google Doc comment which mentions this topic: https://docs.google.com/document/d/1X7pOHxxzZl6PyMAJWd7bIR11rE4FlKty3J7oI6ghAKo/edit?disco=AAAANEF3VwA
Thank you @stuartyeates and @jamiepb , we haven't discussed this and these use cases are extremely helpful!
With input from the Advisory Board, we are thinking of just adding a recommendation to include metadata like @stuartyeates describes in bag-info.txt. We want to be flexible and permissive with metadata, and allowing multiple accounts in 0.3 makes this more challenging to provide a generalizable recommendation. In some cases it may also be useful to manage this information outside of a mailbag per UAlbanyArchives/mailbag-specification#6. I could see some example keys being useful to encourage consistency, but I'm hesitant to be more prescriptive than that. If there are any suggestions, feel free to add them here or in the doc.
Per @jamiepb 's comment we decided to describe these examples as "companion files" and make it clear in the spec that its permissible to keep them in the source format subdirectory. So if you have TSVs or similar for PSTs, they would go in mailbag_root/data/pst
using the same original arrangement. Its unlikely that these would be machine-discoverable outside of local practices, but this way you can just throw them in there if that meets your needs or define a local structure for them and possibly document relationships in bag-info.txt
or another tag file if necessary.
Currently, the mailbagit implementation only packages files with the matching extension. So we're planning to add an optional arg to include all files in a provided directory. Let us know it that doesn't address your concerns!
Addressed by #9
The standard doesn't seem to make clear where to store metadata related to the email account and/or folder that may be important to understanding the contents. For example:
It's probably outside of the scope of the document to specify a format for this metadata, but it should acknowledge the existence of the metadata and describe where it should be stored / linked to.