Open mkrusche opened 5 years ago
Bitcurator contact: Cal Lee at UNC https://sils.unc.edu/people/faculty/profiles/Cal-Lee Bitcurator contact Kam Woods at [UNC https://github.com/kamwoods ](UNC https://github.com/kamwoods )
Initial review of BC from website. From BC wiki page:
The BitCurator Environment is a Ubuntu-derived Linux distribution geared towards the needs of archivists and librarians. It includes a suite of open source digital forensics and data analysis tools to help collecting institutions process born-digital materials. BitCurator supports positive digital preservation outcomes using software (see our Tasks and Tools page) and practices adopted from the digital forensics community.
In the BitCurator Environment you can:
- Create forensic disk images: Disk images packaged with metadata about devices, file systems, and the creation process.
- Analyze files and file systems: View details on file system contents from a wide variety of file systems.
- Extract file system metadata: File system metadata is a critical link in the chain of custody and in records of provenance.
- Identify sensitive information: Locate private and sensitive information on digital media and prepare materials for access.
- Locate and remove duplicate files: Know what files to keep and what can be discarded.
BC runs in its own virtual machine. It packages many other tools together. It also has some custom-developed tools.
These are tools that are not developed by the BitCurator Consortium itself, but are included in the BC distribution.
It appears BC is intended to be used by humans. It is unclear how many of the custom developed interfaces can be interacted with programmatically, especially if the distribution is made to run in a VM (not sure whether this is for security or because most of these tools are linux based, and BC wants to work on Windows). However, many of its tools may be reusable, and may have command-line forms that would facilitate automation. (comment from BC wiki: "The virtual machine version of BitCurator is useful for testing and experimentation, but it is recommended that you run BitCurator on a dedicated machine in production environments by installing from the Live ISO image.")
It does appear like they have done a good job of curating such tools, so if any kind of file analysis functionality is needed, it would be useful to look here for any suggestions.
That said, some standard digital library tools for file identification are not listed, e.g. DROID, file, FITS, PRONOM.
Don presented his findings to the team in the sprint review on 4/28. The PI's will get together to discuss them.
Blog post on bitcurator natural language processing tools https://saaers.wordpress.com/2019/07/02/an-exploration-of-bitcurator-nlp-incorporating-new-tools-for-born-digital-collections/
Review Bitcurator to determine what it does.
https://bitcurator.net/bitcurator/ https://github.com/BitCurator/bitcurator-access/wiki https://confluence.educopia.org/display/BC https://github.com/bitcurator
https://bitcurator.github.io/