achp-project / cultural-heritage

presentation repository
2 stars 2 forks source link

Documenting Good Practices for "Arches Data Lifecyle" Operations #13

Open ekansa opened 4 months ago

ekansa commented 4 months ago

I'd love to compile some of your collective experience about good practices in curating data with Arches over different time scales. It would be great to share knowledge about “gotchas” (where things can go wrong, and how to avoid problems), tips, and other good practices people have learned. Here are some issues that I'd like to cover:

I’d like to gather those good practices for the Arches documentation so we can make Arches managed data easier to maintain. Here's a link to the documentation branch I'm using to compile this guidance: https://github.com/archesproject/arches-docs/blob/data_life_7_5/docs/administering/data-life-cycle.rst

You're welcome to contribute directly to that branch. Or if it is easier, please respond to this thread in the Arches forum: https://community.archesproject.org/t/documentation-request-good-data-lifecycle-practices-with-arches/2387

I'd also be happy to arrange Zoom meetings to spare you from writing if that helps!

zoometh commented 4 months ago

@ekansa this is a brief description of how our 2 projects (with @ads04r, and only 1 database) address your questions on data lifecycle

EAMENA-MaREA

https://eamena.org/home https://marea.soton.ac.uk/

Maintaining data integrity

datatypes_eamena datatypes of the fields of the Heritage Places resource model

Good data practices relating to upgrading Arches (especially major version upgrades)

Security, managing sensitive information

Database backup strategies and approaches

Described previously (see storage)

Database archiving (with outside digital repositories)

We have a workflow (and an Arches plugin) to deposit EAMENA datasets in Zenodo (and have DOIs, a OAI-PMH compliant deposit, etc.)

Useful command line utilities

Documented here and there, and at many places in our GitHub

PostgreSQL utilities

Same as before, here and there.

Use of cloud computing database services (Amazon RDS, etc.)

None