CDLUC3 / dash

General repository for documents and communication for UC Dash project.
http://cdluc3.github.io/dash
MIT License
11 stars 4 forks source link

Is Dash able to say it's about "archiving" or "preservation"? #22

Open strasser opened 10 years ago

strasser commented 10 years ago

From UC Librarian: I think we need to be careful calling DASH an archiving service, given the low metadata requirements, lack of QA/QC processes, format standardization, etc. I'm a lot more comfortable promoting and supporting it as a data sharing and publishing platform.

strasser commented 10 years ago

From UC3:

I’d like to respond to your comment on Dash and distinction between a data preservation and publishing/sharing platform. We have always taken the most expansive view of preservation as a complex of actors, policies, practices, and technologies enabling meaningful engagement with digital resources. From this perspective we think it is reasonable to label Dash (and its underlying dependencies to Merritt and EZID) as a preservation system. While it perhaps is not a fully sufficient solution, it is nevertheless an absolutely necessary one.

The nature of the institutional structure and mission of the CDL with regard to the campuses means that preservation responsibility will always be explicitly distributed and shared. In general CDL has primary responsible for systems and services that provide for the technical control over managed content, while various campus agents (the library, labs, and faculty and graduate researchers), based on their subject area and domain expertise, retain primary responsibility for the curatorial (or intellectual) control. But that control cannot be exercised until the content is placed under appropriate pro-active technical control, and that is what see Dash’s function to be.

By substantially removing the technical barriers to data contribution, Dash facilitates the foundational activity of data collection. While it is possible that any given piece of data in the collection may not be preservable over the long term, if it was never collected at all then it will definitely not be preserved. Furthermore, data managed in Dash also benefit from simple, intuitive user interfaces; persistent identifiers for stable citation and usage statistics; geographically-replicated storage; ongoing fixity audit (and automated self-healing if necessary); a highly-distributed, fault tolerant architecture with 24x7 monitoring; faceted search and browse; indexing by Primo and DCI; and dedicated staff for preservation analysis, planning, maintenance, and operations.

Dash does have minimal prescriptive metadata requirements, but that is by explicit design. In our experience, higher metadata standards directly correlates with lower submission rates, and as we said before, we think the single most important preservation action is the initial collection of the material. While we’re currently only asking for a handful of DataCite metadata elements, they are the key enablers of effective data discovery. A well chosen title, comprehensive but pithy abstract and methods statement can function exceedingly well in connecting researchers with content of interest. We are planning ways to enhance Dash to support the contribution of additional, domain-specific description. For instance, we are working on enhancing Dash with DataUp-like function to supply EML metadata (at both the dataset and individual data variable level) and perform best practices checks on tabular datasets and spreadsheets. And it is important to point out that although Dash doesn’t require extensive metadata, there is no barrier to its inclusion as part of the submitted data package.

As you point out, curatorial assessment and selection are important activities for ensuring the long-term usability of preserved assets. But, unfortunately, we do not have the appropriate staff with domain expertise to provide this function. Rigorous curatorial evaluation seems like a valuable added-value service that can be layered on top of the foundation preservation service provided by Dash. So we’d like to turn this issue around and ask how can we best collaborate with you and your campus colleagues to be able to offer truly comprehensive curatorial and technical preservation services? We are all trying to solve a very big problem and it seems that we can be most efficient and effective through a cooperative division of labor that builds on the strengths our respective organizations.

strasser commented 10 years ago

Response from UC librarian:

I think we're all in agreement that Dash and the technology stack upon which it's layered can play an important role in the dissemination of UC research data. I also think we're in agreement that "inconvenience" factors, like stringent metadata requirements, can be a barrier to data sharing, (although as I'm sure you know, cultural factors are likely the biggest obstacle). That said, I think it's critical that we make a concerted effort to nudge researchers towards best practices when it comes to providing metadata sufficient for both data reuse and replication. Can it really be argued that the dissemination of poorly documented data is anything short of "bad science"? Along those lines, if we are promoting a tool that by design seems to concede on the issue of ''enforced" metadata quality, can't we be accused of helping to perpetuate bad practices?

strasser commented 10 years ago

Response from UC3:

We are all in favor of nudging researchers towards better or best practices. As we shared in our last message Dash is a community run effort so it would be great to hear what additional documentation would move Dash data from bad science to good science. Additional metadata can be included with Dash objects and you can encourage that. Since you are the curatorial expert perhaps you can develop documentation about what additional metadata are needed in this space.