Closed hohonuuli closed 3 months ago
Updated sequence diagram based on MSU's proposal to poll for zip files rather than have them pushed.
sequenceDiagram
autonumber
actor U as User
participant F as FathomNet
participant M as MSU
participant N as NCEI
U->>F: Upload ZIP of images + CSV via HTTP
F-->>U: Ack/200
rect rgb(150, 114, 114)
Note left of F: MBARI
F-)+F: Extract CSV
F-)F: Update image names/locations to NCEI name, MSU location
F-)F: Repackage using NCEI naming conventions
F-)-F: Stage zip file of images/csv to https://fathomnet.org/static/...
rect rgb(114, 114, 150)
Note left of M: MSU
loop Every Day?
Note left of M: This would require us to enable directory listing. Do we want that?
M-)+F: Scan for new zip files
end
M->>F: Download new zip via HTTP
F-->>-M: <zip>
M-)+M: Unzip images and CSV at standard location and provide web access
M-)-F: Send email notification with unzipped location?
end
rect rgb(75, 57, 57)
loop Every Day?
F-)F: Poll for emails periodicaly
end
F-)F: On new email, extract location of new directory
F-)F: Extract location of CSV in new directory
F-)F: Use CSV to register images
F-)F: Delete local zip file from https://fathomnet.org/static/...
F-)U: Send email that images are registered
end
end
rect rgb(114, 150, 114)
Note left of M: NCEI
M->>N: At 6 months migrate to NCEI
N-)N: Unpack at standard location
N->>F: Notify FathomNet of the location change?
end
F-)F: Update image URLs to new locationflowchart TD
Things we'd have to do on the FathomNet side for this:
Email from David Moffitt on 2023-12-01:
I've gotten the email notifications working with a simulated smtp server. I'm putting in a ticket with MSU so I can start testing it with the actual smtp server and have it set up as a cron job. Currently the emails only have a list of the files downloaded and the file size, what other information would be good to have in the notifications?
My response to David's email:
The entire work flow and handshake between FathomNet and MSU is described in a sequence diagram at https://github.com/fathomnet/community-feedback/issues/136#issuecomment-1737705318 .
Currently the emails only have a list of the files downloaded and the file size, what other information would be good to have in the notifications?
Ideally, these are the things I would like in the email:
It would be ideal if the email body is easily parsable by automated code. Example email body with directory listing enabled:
description: MSU file transfer from FathomNet
timestamp: 2023-12-07T01:23:45Z
source: https://fathomnet.org/static/staging/FN2309-small.zip
destination: https://msu.server.edu/path/to/FN2309-small/
Example email body if directory listing is not enabled:
description: MSU file transfer from FathomNet
timestamp: 2023-12-07T01:23:45Z
source: https://fathomnet.org/static/staging/FN2309-small.zip
destination: https://msu.server.edu/path/to/Fn2309-small/
files:
- https://msu.server.edu/path/to/Fn2309-small/FN2309_355922--fb7616ae-38b0-45b5-883b-3d18ab7121cd.png
- https://msu.server.edu/path/to/Fn2309-small/FN2309_414280--704a79af-98ce-4c65-95a8-7273bd3dbaed.png
- https://msu.server.edu/path/to/Fn2309-small/FN2309_550917--64849cc7-2e3f-4eb1-93be-5f016aa540a2.png
- https://msu.server.edu/path/to/Fn2309-small/foobar.csv
Let me know if you think I’m missing anything. Thanks!
Response from @errol-ronje:
[...] Yee Lau is now standing by to complete the Fathomnet automation. Yee and I met last week and came up with a few questions for clarification to help move this forward. Please check our notes below for accuracy and let us know the answer to our questions? We may also have some follow up questions since so much time has passed as we try to get up to speed and back on this project:
Why response:
Hi Errol and Yee,
I’m very excited that we’re moving forward! As a reminder, I keep notes related to this effort on GitHub at https://github.com/orgs/fathomnet/projects/7/views/1. The current, notiional data flow is documented in a diagram at https://github.com/fathomnet/community-feedback/issues/136#issuecomment-1737705318. Since nothing is currently set-in-stone, we can change this workflow as needed so that it works best for both FathomNet and NOAA.
My responses to your notes and questions ….
File name convention for each package: FNYYMM where YY is the 2-digit year and MM is the 2-digit month.
My understanding is that the naming conventions for packages are FNYYMM
Package should be unzipped in https://oer.hpc.msstate.edu/FathomNet/
You will need to preserve the package name. So a package FN2304-small would be extracted into https://oer.hpc.msstate.edu/FathomNet/FN2304-small. Otherwise, we will have problems with name collisions between files.
Once the package is unzipped, it would be helpful if an email is sent to us (Or some other notification, I still have to set up an email account for this) The contents of the email need to be structures so that they can be parse by code. An example email is at https://github.com/fathomnet/community-feedback/issues/136#issuecomment-1845970774. Again, nothing is set yet, so we can adapt this as needed.
NCEI script will move MSU data from the MSU fathomnet directory to NCEI for archiving prep
When the data is moved from MSU to NCEI, can you send us a notification via email?
Can we delete the test package on https://oer.hpc.msstate.edu/FathomNet/20230427_test_package/
Yes! All images from FathomNet at MSU are just for testing purposes. It’s safe to remove any and all of them
What are the other images currently in the FathomNet directory? Can we delete ? https://oer.hpc.msstate.edu/FathomNet/ (e.g.,Acanthogorgiidae001_trimmed.png)
Yes!
FN2304-large and FN2309-small have already been transferred to https://fathomnet.org/static/staging/, is this the complete dataset that is ready for the first archive package?
Those are just packages to use for testing and development and not meant to be permanently archived.
Please let me know if you have any other questions. Yee, I’m looking forward to working with you.
Errol sent this email:
Brian, please find notification of fathoment files transferred below. Is this notification sufficient, and are the files organized as expected?
Subject: FathomNet Download List
description: MSU file transfer from FathomNet
timestamp: 2024-05-01T21:46:26Z
source: https://fathomnet.org/static/staging/FN2304-large.zip
target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large/
source: https://fathomnet.org/static/staging/FN2309-small.zip
target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/
My reply:
It’s a good start but can we tweak how the zip files are unpacked? Thy file might be a zip file of images OR it might be a zipped directory of images. If it’s the later, they get unpacked in a somewhat random directory, it would be much more useful if, after the file is unzipped, all the png or jpg images are moved so they are in the correct staging directory. For example, unzipping FN2304-large results in the images being in a rather redundant path location: https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large/FN2304-large/, ideally the images should be moved to https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large. The same with FN2309, the images are in https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/FN2309/ but it be better if they were relocated to https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/
Let me know if that’s possible.
Sent this email to Yee, Errol and others:
Thanks Yee. That looks great.
To follow up on the email format. The format you have below is AOK for how the web server is currently configured (with the directory listing enabled). One note is that we should standardize on the email’s subject so it’s simple to automate code to watch for the emails. It doesn’t matter so much to me what the subject is, I’ll throw out "MSU file transfer from FathomNet” as a straw man but if you have a preference, just let me know.
Cheers
—— EMAIL description: MSU file transfer from FathomNet timestamp: 2024-05-08T14:42:37Z source: https://fathomnet.org/static/staging/FN2304-large.zip target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large/ source: https://fathomnet.org/static/staging/FN2309-small.zip target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/
Latest updated workflow
sequenceDiagram
autonumber
actor U as User
participant F as FathomNet
participant M as MSU
participant N as NCEI
U->>F: Upload ZIP of images + CSV via HTTP
F-->>U: Ack/200
rect rgb(150, 114, 114)
Note left of F: MBARI - Repackage Zip file
F-)+F: Read zip
F-)F: Rename images using NCEI naming conventions
F-)F: Extract CSV and update image names
F-)F: Generate new zip
F-)-F: Stage to public archive
end
rect rgb(114, 114, 150)
Note left of M: MSU - Provide public access
M-)+F: Scan for new zip files in public archive
F->>-M: Fetch new zip files (HTTP)
M-)M: Unzip images at standard location and provide web access
end
rect rgb(150, 114, 114)
Note left of F: MBARI - Scan for new uploads
F-)+M: Scan for new FathomNet directories
M->>F: Fetch new CSV
F-)F: Use CSV to register images
end
rect rgb(114, 150, 114)
Note left of M: NCEI
M->>N: At 6 months migrate to NCEI
N-)N: Unpack at standard location
N->>F: Notify FathomNet of the location change?
end
F-)F: Update image URLs to new location
This is the initial draft version of the workflow and is subject to change