ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
Apache License 2.0
59 stars 13 forks source link

Export of Complete Collection Data for Migration, Backup #2051

Closed campmlc closed 1 year ago

campmlc commented 5 years ago

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe. We have received repeated inquiries from potential new collections and existing collections as to whether collections data can be exported from Arctos can be exported for backup or migration to a different platform.

Describe the solution you'd like We currently allow export of flat file data as specimen search results through Arctos and DWC fields through external aggregators. Perhaps provide the option of a regular, automated export of these data, ftp'd to a particular server? Additionally, we could add in options for separate, linked downloads of transactions, projects, citations ? (by collection?), object tracking (show all objects in this container, flatten?)

Also explore the option of local Oracle backups by collection? or all Arctos?

Priority Please assign a priority-label.

mkoo commented 4 years ago

Based on our discussions of the 'spectrum' of backup and mirroring options, I'm putting this out there as a middle ground of having an backup snapshot of Arctos offsite from TACC: https://zenodo.org/

We can even backup our Github repos-- really seems to be no limit of what can be uploaded here.

dustymc commented 4 years ago

AWG Issues meeting discussed three things:

  1. Social aspects of https://github.com/ArctosDB/internal/issues/57; we can make full backups and scatter them around, which would mitigate any single-point failure. How do we do that while still preserving confidential data?
  2. Set up a new PG server, import a full backup, delete non-collection (or institution or whatever) data, back that up - this would provide an institutional-level backup that includes DDL. (Would need a dedicated server and a fair bit of DBA time.)
    • This could also be dealt with via an "if you bring this up yo must delete anything that's not yours" clause in (1).
  3. Provide more "flat" options, which we already need for performance reasons
    • potentially cache more, and more-structured, things in FLAT
    • add semi-flattened "extras" (parts, accns, etc.) in some way
diannakrejsa commented 4 years ago

If I can help with development of these flat backups, let me know a specific aspect that needs time put toward it. I'm interested in developing a regular export backup (we'd discussed a template for this in the past).

dustymc commented 4 years ago

specific aspect

There are two parts to that:

  1. I need to know exactly what's wanted. The best language in which to explain that may be SQL, which would simplify the next step.
  2. We need some documentation of those; users should have a way to know what they can and cannot do with any given extraction.
diannakrejsa commented 4 years ago
The best language in which to explain that may be SQL

Example of an Arctos field expressed as SQL? What's the file format you're looking for for this?

dustymc commented 4 years ago

What's the file format you're looking for for this?

That's a question for those who want flat extracts. I see no way in which they can avoid being lossy so they can be of ~no value to me.

I keep hearing things like "loans." That could mean "partial dump of table LOAN" or it could need to include the 3rd phone number of the 9th preparator of specimens related to specimens from which parts were loaned - data which is in, and easily understandable and recoverable from, a DB export. I doubt you want that specific chunk of data, but there's an infinite amount of data which could be critical to certain tasks (understanding what was used for or intended by a citation in a publication, for example) that's an equal distance from table LOAN. A request for flat files is essentially a request to discard information; I need to know precisely what you don't want to toss, and how you want it arranged.

diannakrejsa commented 4 years ago

From the very narrow perspective of what I want, for a start I want just what goes in the specimen bulkloader, plus preparators/prep num/other ids and parts. It would be very close to what is available in the fields of the data tools download, just a few more fields would need to be added to those available.

How would you like the information presented to you/the group for further editing? Columns ala the bulkloader (more long-form) or ala the download data tool? Then we can add a "wish list" of aspects others may want (e.g., loans) and figure out what parts of those wish list items can be added?

dustymc commented 4 years ago

what goes in the specimen bulkloader

I would need to know what to do with the 11th collector, 13th attribute, 2nd specimen-event, etc. (And implicit agreement that strings are sufficient for your purposes - eg, the only thing you care about regarding agents is preferred_name, all other agent data can be discarded for this.)

preparators

Those are covered by "what goes in the specimen bulkloader."

prep num/other ids and parts

I need specifics; there can be any number of otherIDs, and parts have an additional dimension for part attributes.

How would you like the information presented

I don't know, perhaps because I'm having difficulty understanding the purpose. Maybe manually munge whatever you want of a record into a CSV file as an example???? That seems a fairly painful way to approach this, but it would let me request adding another record when something doesn't fit - maybe it would provide an effective means of communication.

diannakrejsa commented 4 years ago

Export_Template.zip

Alright, attached is a stab at a beginning template. I started with the data within a bulkloader file for ASNHC:Mamm:20000. I added columns for other common data as well example fields that Mariel and Dusty exported before (something I had saved off as temp_flatbits_missingvalues, not sure that name would ring any bells for what ya'll did to export those fields in the past). The third row includes fields that might be sunk within the column above them. At the end of the series of columns I added "Loans?" simply because I imagine someone will want that data exportable in some capacity.

dustymc commented 4 years ago

Excellent, thanks! I pulled that into https://docs.google.com/spreadsheets/d/1caZi8YvjKtMIklVSnlnfG3BdD1WQ3rQMUQgNGzlZqbA/edit#gid=1443094724 and anyone can edit. I made some preliminary comments. Essentially I'd need more detail; what precisely do you mean by "sex" (for example), and if there are 13972 determinations then how would you like them handled?

diannakrejsa commented 4 years ago

Cool! I wrote some responses to these, but other folks should take a look since I don't necessarily have a stake in every field (or know the full usage someone might require of them). When going through, one thought I had was making it a multi-page export process where data managers select what database they manage they'd like exported, then it's a locality page where they check what aspects of locality for those records they may want, then it's an attributes page with all options from the attributes_code_tables list and they check which ones they want to export data from, and so on. Kind of like the Download data tools thing but with more options? I'll let others take a look at it.

Jegelewicz commented 4 years ago

multi-page export process where data managers select what database they manage they'd like exported, then it's a locality page where they check what aspects of locality for those records they may want, then it's an attributes page with all options from the attributes_code_tables list and they check which ones they want to export data from, and so on. Kind of like the Download data tools thing but with more options?

OK - I am going to say what I think I have been saying all along. A complete export should be more than one file. Here is what you need (stuff in parens are the columns for each file, not comprehensive at this point...)

What have I forgotten? This is going to give you "your" data in a way that could be related to each other so that you could re-create stuff in Arctos with bulkloader tools. It will not be useable as Arctos, but that isn't what we are after here is it? Each file is going to include one row of data for each "thing", so if you have an object with multiple identifications, you are going to have more than one row using that GUID in column one. This is what you are going to need if you want to import the data into something else. If object tracking is used, a file for BARCODE (BARCODE, PARENT_BARCODE) would be needed as well and maybe something else I am missing.

dustymc commented 4 years ago

What have I forgotten?

Probably something - "complete export" still seems a very wrong description - but I think that's closer to achievable, and more useful, than trying to pretend that Arctos is a giant spreadsheet can be.

This is getting closer to a DB dump, which includes everything you've mentioned plus whatever you've forgotten, and includes assembly instructions in a language that both computers and people can understand.

campmlc commented 4 years ago

Yes, absolutely, this is what I have been trying to request. Also need a file for BARCODE (BARCODE, PARENT_BARCODE), or better yet: Part Location Path. A DB is fine as long as it includes files that can be opened in spreadsheets. I have a server . . . ready to move MSB data there now.

On Mon, Jun 29, 2020 at 10:07 AM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

What have I forgotten?

Probably something - "complete export" still seems a very wrong description - but I think that's closer to achievable, and more useful, than trying to pretend that Arctos is a giant spreadsheet can be.

This is getting closer to a DB dump, which includes everything you've mentioned plus whatever you've forgotten, and includes assembly instructions in a language that both computers and people can understand.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2051#issuecomment-651215813, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBBACAIH6C7SFAIAZ5TRZC32FANCNFSM4HHTHLUA .

Jegelewicz commented 2 years ago

Given that @mkoo asked this of a potential incoming collection yesterday

"If you can get your data out of Specify..."

We really need to think about how this would work when a collection eventually decides to leave Arctos.

campmlc commented 2 years ago

Agree.

On Wed, Feb 16, 2022, 9:34 AM Teresa Mayfield-Meyer < @.***> wrote:

  • [EXTERNAL]*

Given that @mkoo https://github.com/mkoo asked this of a potential incoming collection yesterday

"If you can get your data out of Specify..."

We really need to think about how this would work when a collection eventually decides to leave Arctos.

— Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2051#issuecomment-1041854774, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBHHUIZBNQHVJEE4NVLU3PG2PANCNFSM4HHTHLUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

dustymc commented 2 years ago

when a collection eventually decides to leave Arctos.

In the past I've provided their parts of tables as CSV.

Happy to discuss more, but I don't think this is going to go anywhere without some actionable specification - eg "LOAN fields" could literally be almost anything, I would need specifics to act.

Jegelewicz commented 2 years ago

wondering how https://github.com/ArctosDB/internal/issues/168 would make this "easier"?