Lucy-Family-Institute / presqt

Tools and RESTful Services to Improve Preservation and Re-use of Research Data & Software.
Apache License 2.0
4 stars 2 forks source link

Review Bitcurator #142

Open mkrusche opened 5 years ago

mkrusche commented 5 years ago

Review Bitcurator to determine what it does.

https://bitcurator.net/bitcurator/ https://github.com/BitCurator/bitcurator-access/wiki https://confluence.educopia.org/display/BC https://github.com/bitcurator

https://bitcurator.github.io/

nkmeyers commented 5 years ago

Bitcurator contact: Cal Lee at UNC https://sils.unc.edu/people/faculty/profiles/Cal-Lee Bitcurator contact Kam Woods at [UNC https://github.com/kamwoods ](UNC https://github.com/kamwoods )

dbrower commented 5 years ago

BitCurator Evaluation

Initial review of BC from website. From BC wiki page:

The BitCurator Environment is a Ubuntu-derived Linux distribution geared towards the needs of archivists and librarians. It includes a suite of open source digital forensics and data analysis tools to help collecting institutions process born-digital materials. BitCurator supports positive digital preservation outcomes using software (see our Tasks and Tools page) and practices adopted from the digital forensics community.

In the BitCurator Environment you can:

  • Create forensic disk images: Disk images packaged with metadata about devices, file systems, and the creation process.
    • Analyze files and file systems: View details on file system contents from a wide variety of file systems.
    • Extract file system metadata: File system metadata is a critical link in the chain of custody and in records of provenance.
    • Identify sensitive information: Locate private and sensitive information on digital media and prepare materials for access.
    • Locate and remove duplicate files: Know what files to keep and what can be discarded.

BC runs in its own virtual machine. It packages many other tools together. It also has some custom-developed tools.

Custom developed tools

Packaged external tools

These are tools that are not developed by the BitCurator Consortium itself, but are included in the BC distribution.

Disk imaging

Forensic analysis

Other tools

Thoughts

It appears BC is intended to be used by humans. It is unclear how many of the custom developed interfaces can be interacted with programmatically, especially if the distribution is made to run in a VM (not sure whether this is for security or because most of these tools are linux based, and BC wants to work on Windows). However, many of its tools may be reusable, and may have command-line forms that would facilitate automation. (comment from BC wiki: "The virtual machine version of BitCurator is useful for testing and experimentation, but it is recommended that you run BitCurator on a dedicated machine in production environments by installing from the Live ISO image.")

It does appear like they have done a good job of curating such tools, so if any kind of file analysis functionality is needed, it would be useful to look here for any suggestions.

That said, some standard digital library tools for file identification are not listed, e.g. DROID, file, FITS, PRONOM.

mkrusche commented 5 years ago

Don presented his findings to the team in the sprint review on 4/28. The PI's will get together to discuss them.

dbrower commented 5 years ago

Blog post on bitcurator natural language processing tools https://saaers.wordpress.com/2019/07/02/an-exploration-of-bitcurator-nlp-incorporating-new-tools-for-born-digital-collections/