fathomnet / community-feedback

1 stars 0 forks source link

Develop workflow for archiving images to MSU/NOAA #88

Closed hohonuuli closed 3 months ago

hohonuuli commented 1 year ago

Staging to MSU

  1. Stage at MBARI for FathomNet. Then once a month kick off staging to MSU.
  2. Move to location at MSU. Megan has a server setup outside of NOAA at MSU. Reach out to Megan or Caitlin, they will give me an FTP site to stage them. As zipped data file, directories can be problematic, so just a flat zip of images.
    1. Add prefix to images names. 2 letters and 4 numbers. (FN - FathomNet) (YYMM or YYYY). Names only need to be unique in a zip file. Directorys at MSU will be based on first 6 characters. So maybe generate Fn + random 4 numbers key?
  3. Create a directory named FNXXXX and host it on MSU.
  4. MSU wil shoot me an email with the dire location.
  5. I'll them update the links in FathomNet to point at MSU

Moving to NOAA

References

hohonuuli commented 1 year ago

I grabbed Doc Ricketts images for dives 1042-1313 that were staged to atlas:/FathomNet/web/m3/staging to use a test set for this workflow. (There are 82 images in the set). I renamed the images to the expected naming scheme and compiled into a single directory using:

#!/usr/bin/env -S scala-cli shebang --scala-version 3.2.2

import java.nio.file.{FileSystems, Files}
import scala.jdk.CollectionConverters.*

val sourceDir = FileSystems.getDefault.getPath(args(0)) 
val targetDir = FileSystems.getDefault.getPath(args(1))

val images = Files.walk(sourceDir)
  .iterator()
  .asScala
  .filter(Files.isRegularFile(_))
  .filter(i => {
    val name = i.getFileName.toString.toLowerCase()
    name.endsWith(".jpg") || name.endsWith(".jpeg") || name.endsWith(".png")
  })

val dateformat = java.time.format.DateTimeFormatter.ofPattern("YYMM")
val prefix = s"FN${dateformat.format(java.time.LocalDate.now)}"

for (image <- images) {
  val name = image.getFileName.toString
  val newName = s"$prefix-$name"
  val newImage = targetDir.resolve(newName)
  Files.move(image, newImage)
}
hohonuuli commented 1 year ago

I sent an email to Megan asking for FTP access to MSU to upload the image set.

hohonuuli commented 1 year ago

From Megan Cromwell on Slack

Hey! The fisheries data is here. Let me know if they need to fix anything as I'm not sure what it is supposed to look like at this point. 7:29 https://oer.hpc.msstate.edu/FathomNet/FisheriesData/

hohonuuli commented 1 year ago

From Megan via email:

I'm out of town, but working on transition planning. Would you mind sharing your notes so I can go ahead and send them to Caitlin? If not, it's ok! I just thought it would be faster than waiting until I get home.

hohonuuli commented 1 year ago

Notes about Fisheries samples on NOAA server.

I'm assuming these are being staged for submission to FathomNet...

Are all the files going to be put in a single directory under https://oer.hpc.msstate.edu/FathomNet/FisheriesData/? I have concerns about the number of files in a single directory. Not sure how well that will play with your servers file system or the web servers directory listing. You might want to organize them in subdirectories. I'm downloading the https://oer.hpc.msstate.edu/FathomNet/FisheriesData/For_Training_01_26_2022.zip file to inspect it, but it's almost 100GB. So it may take a while.

Screenshot 2023-02-07 at 1 39 51 PM

The 3 sample files aren't following the naming convention we discussed of starting with a two letters and four digits.

What's NOAA's plan for this file, are you going to extract the images and stage them to MSU yourselves or were you going to try to submit the zip to FathomNet? Just FYI, we won't allow files that large; the zip file that is, the individual image sizes are fine. I haven't figured out a size limit yet, but accepting large file size opens us up to denial of service attacks.

megancromwell commented 1 year ago

Hi! We can divide into subdirectories. I haven't looked at it. I had a teammate move it for the testing with 3 outside to test the linking. Do you see any logical breaks for directory structures? I do know we're going to need to update the filenames as well (I noticed when I sent you the link). If not, what would you suggest or prefer regarding breaks. We can just break into manageable chunks (preferably under 10 g, definitely under 25).

hohonuuli commented 1 year ago

@megancromwell I unpacked the zip file. It has subdirectories like JP_1, through JP_24, JRS_1 through JRS_41, and one oddball JP_spherical1. The amount of image data each directory contains varies between from 10s of MB through at least 2GB. Each directory contains black and white PNG images and a CSV file named the same as the directory, e.g. JRS_15.csv.

BTW, from FathomNet's point of view that's all fine. For submitting the data to FathomNet, you can host it, and remunge the CSV file to FathomNet's format as described at https://medium.com/fathomnet/how-to-submit-localized-image-annotations-to-fathomnet-baf25dbd8165. I would definitely add a column in the CSV of imagingtype with a value of black and white. I might be able to help you with that if needed.

hohonuuli commented 1 year ago

Sequence Diagram

sequenceDiagram
    actor U as User
    box FathomNet
    participant F as FathomNet
    participant A as Atlas
    end
    box NOAA
    participant M as MSU
    participant O as NOAA OER
    end

    rect rgb(1, 87, 155)
    note right of U: Initial Upload to FathomNet
    U->>+F: Upload ZIP of images + CSV
    F->>A: Unzip and archive Images
    F->>F: Register Images and Localizations
    F-->>-U: Notify User
    end
    rect rgb(51, 105, 30)
    note right of F: Temporary archive at MSU
    F->>+M: Zip archived images and stage to MSU (how often?)
    M->>M: Unzip images and provide web access
    M-->-F: How will FathomNet get the new image URLs?
    F->>F: Updated image URLs to MSU location
    F->>A: Remove images that were staged to MSU
    end
    rect rgb(0, 96, 100)
    note right of F: Final archive at NOAA OER
    F->>F: Create CSV archive of database tables
    F->>+M: Upload CSV archiveto MSU?
    F->>M: Is the plan to upload ALL images to NOAA OER?
    M->>O: Archive CSV 
    M->>O: Archive Images
    O-->>F: How will FathomNet get the new image URLs?
    F->>F: Update image URLs to OER location
    end
hohonuuli commented 1 year ago

@errol-ronje I have a small zip file of images to use as a test set. How do I get them archived to MSU's servers?

errol-ronje commented 1 year ago

Brian, can you move it to this Google Drive folder for staging, then I will push it to MSU: https://drive.google.com/drive/folders/13n2Pwy87VdWDgJSdIfTkgsqMd3DOGAhx?usp=sharing

Errol Ronje Oceanographer Oceanographic and Geophysical Science and Services Division NOAA National Centers for Environmental Information (NCEI) Stennis Space Center, MS https://orcid.org/0000-0003-3312-5662

On Wed, Apr 19, 2023 at 1:38 PM Brian Schlining @.***> wrote:

@errol-ronje https://github.com/errol-ronje I have a small zip file of images to use as a test set. How do I get them archived to MSU's servers?

— Reply to this email directly, view it on GitHub https://github.com/fathomnet/community-feedback/issues/88#issuecomment-1515196838, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7ILVXTIO4E5NABUPPASJA3XCAWLHANCNFSM6AAAAAAUH4WHEA . You are receiving this because you were mentioned.Message ID: @.***>

hohonuuli commented 1 year ago

@megancromwell @errol-ronje I've dropped a zip file named fathomnet.zip in that Google Drive directory.

hohonuuli commented 3 months ago

This part is completed. I've deployed bitfrost to repackage uploaded zip files and move them to a staging location (https://fathomnet.org/static/staging/). I'm moving on to new tasks to complete this pipeline.