fathomnet / community-feedback

1 stars 0 forks source link

Assemble NCEI image archiving functional requirements #136

Closed hohonuuli closed 1 month ago

hohonuuli commented 1 year ago

This is the initial draft version of the workflow and is subject to change

sequenceDiagram
    autonumber
    actor U as User
    participant F as FathomNet
    participant M as MSU
    participant N as NCEI
    U->>F: Upload ZIP of images + CSV via HTTP
    F-->>U: Ack/200
    rect rgb(150, 114, 114)
    Note left of F: MBARI
    F-)+F: Repackage using NCEI naming conventions
    F-)F: Extract CSV
    F-)F: Update image names/locations to NCEI name, MSU location
    rect rgb(114, 114, 150)
    Note left of M: MSU
    F->>+M: Upload (FTP, HTTP?)
    M-)-M: Unzip images at standard location and provide web access
    end
    F-)-F: Use CSV to register images
    end
    rect rgb(114, 150, 114)
    Note left of M: NCEI
    M->>N: At 6 months migrate to NCEI
    N-)N: Unpack at standard location
    N->>F: Notify FathomNet of the location change?
    end
    F-)F: Update image URLs to new location
hohonuuli commented 1 year ago

Updated sequence diagram based on MSU's proposal to poll for zip files rather than have them pushed.

sequenceDiagram
    autonumber
    actor U as User
    participant F as FathomNet
    participant M as MSU
    participant N as NCEI
    U->>F: Upload ZIP of images + CSV via HTTP
    F-->>U: Ack/200
    rect rgb(150, 114, 114)
    Note left of F: MBARI
    F-)+F: Extract CSV
    F-)F: Update image names/locations to NCEI name, MSU location
    F-)F: Repackage using NCEI naming conventions
    F-)-F: Stage zip file of images/csv to https://fathomnet.org/static/...
    rect rgb(114, 114, 150)
    Note left of M: MSU
    loop Every Day?
        Note left of M: This would require us to enable directory listing. Do we want that?
        M-)+F: Scan for new zip files
    end
    M->>F: Download new zip via HTTP
    F-->>-M: <zip> 
    M-)+M: Unzip images and CSV at standard location and provide web access
    M-)-F: Send email notification with unzipped location?
    end
     rect rgb(75, 57, 57)
    loop Every Day?
        F-)F: Poll for emails periodicaly
    end
    F-)F: On new email, extract location of new directory
    F-)F: Extract location of CSV in new directory
    F-)F: Use CSV to register images
    F-)F: Delete local zip file from https://fathomnet.org/static/...
    F-)U: Send email that images are registered
    end
    end
    rect rgb(114, 150, 114)
    Note left of M: NCEI
    M->>N: At 6 months migrate to NCEI
    N-)N: Unpack at standard location
    N->>F: Notify FathomNet of the location change?
    end
    F-)F: Update image URLs to new locationflowchart TD

Things we'd have to do on the FathomNet side for this:

hohonuuli commented 9 months ago

Email from David Moffitt on 2023-12-01:

I've gotten the email notifications working with a simulated smtp server. I'm putting in a ticket with MSU so I can start testing it with the actual smtp server and have it set up as a cron job. Currently the emails only have a list of the files downloaded and the file size, what other information would be good to have in the notifications?

hohonuuli commented 9 months ago

My response to David's email:

The entire work flow and handshake between FathomNet and MSU is described in a sequence diagram at https://github.com/fathomnet/community-feedback/issues/136#issuecomment-1737705318 .

Currently the emails only have a list of the files downloaded and the file size, what other information would be good to have in the notifications?

Ideally, these are the things I would like in the email:

  1. The URL to the original file fetched from https://fathomnet.org/static/staging/
  2. The url to the unzipped root dir of that file on MSU servers.
  3. If directory listing is enabled on MSU’s server, the url (in 2 above) is enough. We can just scrape the directory listing for the files that were in the zip file. If directory listing is not enabled, the email should contain the full url to every file that was extracted from the zip file.
  4. The email should contain the date/time in the text body of when the file was extracted.

It would be ideal if the email body is easily parsable by automated code. Example email body with directory listing enabled:

description: MSU file transfer from FathomNet
timestamp: 2023-12-07T01:23:45Z
source: https://fathomnet.org/static/staging/FN2309-small.zip
destination: https://msu.server.edu/path/to/FN2309-small/ 

Example email body if directory listing is not enabled:

description: MSU file transfer from FathomNet
timestamp: 2023-12-07T01:23:45Z
source: https://fathomnet.org/static/staging/FN2309-small.zip
destination: https://msu.server.edu/path/to/Fn2309-small/ 
files:
  - https://msu.server.edu/path/to/Fn2309-small/FN2309_355922--fb7616ae-38b0-45b5-883b-3d18ab7121cd.png
  - https://msu.server.edu/path/to/Fn2309-small/FN2309_414280--704a79af-98ce-4c65-95a8-7273bd3dbaed.png
  - https://msu.server.edu/path/to/Fn2309-small/FN2309_550917--64849cc7-2e3f-4eb1-93be-5f016aa540a2.png
  - https://msu.server.edu/path/to/Fn2309-small/foobar.csv 

Let me know if you think I’m missing anything. Thanks!

hohonuuli commented 5 months ago

Response from @errol-ronje:

[...] Yee Lau is now standing by to complete the Fathomnet automation. Yee and I met last week and came up with a few questions for clarification to help move this forward. Please check our notes below for accuracy and let us know the answer to our questions? We may also have some follow up questions since so much time has passed as we try to get up to speed and back on this project:

Notes

Questions:

hohonuuli commented 5 months ago

Why response:

Hi Errol and Yee,

I’m very excited that we’re moving forward! As a reminder, I keep notes related to this effort on GitHub at https://github.com/orgs/fathomnet/projects/7/views/1. The current, notiional data flow is documented in a diagram at https://github.com/fathomnet/community-feedback/issues/136#issuecomment-1737705318. Since nothing is currently set-in-stone, we can change this workflow as needed so that it works best for both FathomNet and NOAA.

My responses to your notes and questions ….

NOTES:

File name convention for each package: FNYYMM where YY is the 2-digit year and MM is the 2-digit month.

My understanding is that the naming conventions for packages are FNYYMM. For example, FN2304-small and FN2304-large and these will be extracted to directories on MSU servers with the same names. The extra characters are needed to avoid naming collisions between packages.

Package should be unzipped in https://oer.hpc.msstate.edu/FathomNet/

You will need to preserve the package name. So a package FN2304-small would be extracted into https://oer.hpc.msstate.edu/FathomNet/FN2304-small. Otherwise, we will have problems with name collisions between files.

Once the package is unzipped, it would be helpful if an email is sent to us (Or some other notification, I still have to set up an email account for this) The contents of the email need to be structures so that they can be parse by code. An example email is at https://github.com/fathomnet/community-feedback/issues/136#issuecomment-1845970774. Again, nothing is set yet, so we can adapt this as needed.

NCEI script will move MSU data from the MSU fathomnet directory to NCEI for archiving prep

When the data is moved from MSU to NCEI, can you send us a notification via email?

QUESTIONS:

Can we delete the test package on https://oer.hpc.msstate.edu/FathomNet/20230427_test_package/

Yes! All images from FathomNet at MSU are just for testing purposes. It’s safe to remove any and all of them

What are the other images currently in the FathomNet directory? Can we delete ? https://oer.hpc.msstate.edu/FathomNet/ (e.g.,Acanthogorgiidae001_trimmed.png)

Yes!

FN2304-large and FN2309-small have already been transferred to https://fathomnet.org/static/staging/, is this the complete dataset that is ready for the first archive package?

Those are just packages to use for testing and development and not meant to be permanently archived.

Please let me know if you have any other questions. Yee, I’m looking forward to working with you.

hohonuuli commented 4 months ago

Errol sent this email:

Brian, please find notification of fathoment files transferred below. Is this notification sufficient, and are the files organized as expected?

Subject: FathomNet Download List

description: MSU file transfer from FathomNet
timestamp: 2024-05-01T21:46:26Z
source: https://fathomnet.org/static/staging/FN2304-large.zip
target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large/
source: https://fathomnet.org/static/staging/FN2309-small.zip
target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/
hohonuuli commented 4 months ago

My reply:

It’s a good start but can we tweak how the zip files are unpacked? Thy file might be a zip file of images OR it might be a zipped directory of images. If it’s the later, they get unpacked in a somewhat random directory, it would be much more useful if, after the file is unzipped, all the png or jpg images are moved so they are in the correct staging directory. For example, unzipping FN2304-large results in the images being in a rather redundant path location: https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large/FN2304-large/, ideally the images should be moved to https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large. The same with FN2309, the images are in https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/FN2309/ but it be better if they were relocated to https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/

Let me know if that’s possible.

hohonuuli commented 4 months ago

Sent this email to Yee, Errol and others:

Thanks Yee. That looks great.

To follow up on the email format. The format you have below is AOK for how the web server is currently configured (with the directory listing enabled). One note is that we should standardize on the email’s subject so it’s simple to automate code to watch for the emails. It doesn’t matter so much to me what the subject is, I’ll throw out "MSU file transfer from FathomNet” as a straw man but if you have a preference, just let me know.

Cheers

—— EMAIL description: MSU file transfer from FathomNet timestamp: 2024-05-08T14:42:37Z source: https://fathomnet.org/static/staging/FN2304-large.zip target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2304-large/ source: https://fathomnet.org/static/staging/FN2309-small.zip target: https://oer.hpc.msstate.edu/FathomNet/staging/FN2309-small/

hohonuuli commented 1 month ago

https://github.com/fathomnet/fathomnet-support/pull/6

hohonuuli commented 1 week ago

Latest updated workflow

sequenceDiagram
    autonumber
    actor U as User
    participant F as FathomNet
    participant M as MSU
    participant N as NCEI
    U->>F: Upload ZIP of images + CSV via HTTP
    F-->>U: Ack/200
    rect rgb(150, 114, 114)
    Note left of F: MBARI - Repackage Zip file
    F-)+F: Read zip
    F-)F: Rename images using NCEI naming conventions
    F-)F: Extract CSV and update image names
    F-)F: Generate new zip
    F-)-F: Stage to public archive
    end
    rect rgb(114, 114, 150)
    Note left of M: MSU - Provide public access
    M-)+F: Scan for new zip files in public archive
    F->>-M: Fetch new zip files (HTTP)
    M-)M: Unzip images at standard location and provide web access
    end

    rect rgb(150, 114, 114)
    Note left of F: MBARI - Scan for new uploads
    F-)+M: Scan for new FathomNet directories 
    M->>F: Fetch new CSV
    F-)F: Use CSV to register images
    end

    rect rgb(114, 150, 114)
    Note left of M: NCEI
    M->>N: At 6 months migrate to NCEI
    N-)N: Unpack at standard location
    N->>F: Notify FathomNet of the location change?
    end
    F-)F: Update image URLs to new location