emory-libraries / dlp-curate

Digital curation and preservation workbench for the Emory Preservation Repository.
11 stars 4 forks source link

Transfer files for the James Archer Sermons #1583

Closed kmichaelis closed 3 years ago

kmichaelis commented 3 years ago

ACTIONS TO UNBLOCK

DESCRIPTION

Please transfer the following files from Isilon AND from OneDrive to our AWS EFS mount, using Rclone with either MD5 or SHA1 to perform fixity checking during transfer.

1) Base path where TIFF files are located (Libraries Isilon "dmfiles" directory root):

nasn2dmz.cc.emory.edu/dmfiles/Libraries/Theology_Pitts/P-MSS006/B001

2) 60 PDFs from OneDrive folder

Specific list of files or directories to be copied:

All TIFF files in the Isilon folders All PDFs in the OneDrive folder

Desired destination directory structure in EFS:

/mnt/efs/Collections/dmfiles/ + relative directory structure from original Isilon location, e.g.:

TIFFs: /mnt/efs/Collections/dmfiles/Libraries/Theology_Pitts/P-MSS006/B001

PDFs: transfer files from each PDF subfolder (F001, F002, etc.) into the corresponding folder on /mnt/efs /mnt/efs/Collections/dmfiles/Libraries/Theology_Pitts/P-MSS006/B001/F001

Estimated count of files to transfer

2,250+ (TIFF files) Note: there may be more files in the folders we are transferring, but there should be at least 2,250 (identified for ingest) 60 PDFs

Estimated size of files to transfer: Approximately 89 GB

libdgg commented 3 years ago

NOTES:

libdgg commented 3 years ago

Moving this to blocked until we figure out the process/sever Mark used and also confirm if the DAMs space issue is a related blocker before the actual work of file transfer begins.

I have emailed Steve Collins, Solomon Hilliard (and copied Kevin Chen and Alex Cooper) to see if they know the server that Mark ran Rclone from.

libdgg commented 3 years ago

@kmichaelis @eporter23 @rotated8 See update below. Does this help us in any way to move forward on our own? Or are there follow up questions or actions that may be needed?

According to Solomon (per email 3/9/2021) "Mark used libetdappprod1.library.emory.edu, here's a screenshot of the history of the root user's history. I made Emily and Kathryn admins on libetdapprod1 before I left the library."

download-shot-1.png

download-shot-2.png

eporter23 commented 3 years ago

@libdgg that's helpful to know. My next question is if we want to continue running rclone from here, or if we want to run it elsewhere? I guess either Kathryn or I could try testing this but neither of us are very familiar. Another question is what to do about getting other users setup as admins on this server if it's where/how we want to continue. Both Kathryn and I continue to have permissions issues with uploads/downloads through this connection, but we aren't sure if it's how we've been trying to do it.

eporter23 commented 3 years ago

More notes: my understanding is this is a VM that has both Isilon and EFS mounted. Kathryn and I can connect to it and view files. I'm able to download, but not upload. (But again, we may just be using an incorrect process. I've been trying to use scp)

libdgg commented 3 years ago

@kmichaelis @eporter23 @rotated8 I checked with Kyle Fenton to see what he might know about Rclone.

Kyle says "I've never used Rclone (just rsync, which inspired Rclone), but it looks pretty cool! The only server that Mark used that I'm aware of is mgmtprd001.library.emory.edu. He used that to set up an ftp server securely so that we have a way to "publish" digitized books out of i2s LIMB and get them to an Isilon staging area."

My notes show that mgmtprd001.library.emory.edu is nicknamed "Bela" and used for "Nagios and other management stuff"

kmichaelis commented 3 years ago

@libdgg I don't think that we are married to Rclone--that is the process that Mark used but @eporter23 and I discussed last week that we would be okay with just being able to transfer files via ftp, at least as an interim measure. Emily said she previously got reports from Middleware about fixity checks run via Rclone but they hadn't been sharing those with us since I took over requesting file transfers.

I don't know anything about the server that Kyle mentioned.

libdgg commented 3 years ago

@rotated8 more notes about the server that Mark used for Rclone that I found in the servicenow entry for that server.

part of ETD: "Electronic Dissertations and Theses" EU Libraries App owner: LITS: Library Core Services: - lib_core_services@emory.edu - several ppl including Prefer, Mark mark.prefer@emory.edu; Cooper, Alexander G. alexander.cooper@emory.edu Product/data owner: collin.brittle@emory.edu (LITS: Digital Library) libetdappprod1.library.emory.edu has Trusted Storage as of 5/2018, not yet in production, acc to Solomon H.

kmichaelis commented 3 years ago

Just wanted to confirm on this ticket that the issue of DAMS space is not a blocker for this process.

kmichaelis commented 3 years ago

@libdgg @rotated8 Any updates on this?

libdgg commented 3 years ago

@eporter23 @kmichaelis @AGCooper @rotated8

ACTIONS from 2021-03-18 meeting - please edit or update the following as needed.

STATUS UPDATES 2021-03-23

  1. COMPLETE - DG schedule a working meeting
  2. COMPLETE - EP and KM identify a small test for us to use - 96 files identified
  3. BLOCKED/INCOMPLETE - EP and AC will work on adding other users to the server - ACTION: need to learn the process to add users and ensure the people on this ticket can sudo
  4. COMPLETE - ALL confirm you can SSH into the server to prepare for the working meeting - NOTE: the only people that can login are Emily and Kathryn - see action for item #3
  5. COMPLETE - ALL will meet to run rclone commands on the test items and confirm if they show up in EFS
eporter23 commented 3 years ago

For # 2 above, here is a small sample set: /opt/dmfiles/Libraries/Theology_Pitts/P-MSS006/B001/F001/I001 @kmichaelis and I both tried a manual sudo cp of a couple of these files up to my test directory on EFS, both successful.

libdgg commented 3 years ago

ACTIONS

@eporter23 will check on the rest of the collection @AGCooper will follow up with solomon - emily, kathryn, collin, alex and doug need sudo access to this box- we need the method to add users since we will be hiring others @libdgg will follow up with Rosalyn about a larger strategy for all the servers, elevated access to be able to manage things going forward for this and other services