Closed hohonuuli closed 3 months ago
I grabbed Doc Ricketts images for dives 1042-1313 that were staged to atlas:/FathomNet/web/m3/staging
to use a test set for this workflow. (There are 82 images in the set). I renamed the images to the expected naming scheme and compiled into a single directory using:
#!/usr/bin/env -S scala-cli shebang --scala-version 3.2.2
import java.nio.file.{FileSystems, Files}
import scala.jdk.CollectionConverters.*
val sourceDir = FileSystems.getDefault.getPath(args(0))
val targetDir = FileSystems.getDefault.getPath(args(1))
val images = Files.walk(sourceDir)
.iterator()
.asScala
.filter(Files.isRegularFile(_))
.filter(i => {
val name = i.getFileName.toString.toLowerCase()
name.endsWith(".jpg") || name.endsWith(".jpeg") || name.endsWith(".png")
})
val dateformat = java.time.format.DateTimeFormatter.ofPattern("YYMM")
val prefix = s"FN${dateformat.format(java.time.LocalDate.now)}"
for (image <- images) {
val name = image.getFileName.toString
val newName = s"$prefix-$name"
val newImage = targetDir.resolve(newName)
Files.move(image, newImage)
}
I sent an email to Megan asking for FTP access to MSU to upload the image set.
From Megan Cromwell on Slack
Hey! The fisheries data is here. Let me know if they need to fix anything as I'm not sure what it is supposed to look like at this point. 7:29 https://oer.hpc.msstate.edu/FathomNet/FisheriesData/
From Megan via email:
I'm out of town, but working on transition planning. Would you mind sharing your notes so I can go ahead and send them to Caitlin? If not, it's ok! I just thought it would be faster than waiting until I get home.
I'm assuming these are being staged for submission to FathomNet...
Are all the files going to be put in a single directory under https://oer.hpc.msstate.edu/FathomNet/FisheriesData/? I have concerns about the number of files in a single directory. Not sure how well that will play with your servers file system or the web servers directory listing. You might want to organize them in subdirectories. I'm downloading the https://oer.hpc.msstate.edu/FathomNet/FisheriesData/For_Training_01_26_2022.zip file to inspect it, but it's almost 100GB. So it may take a while.
The 3 sample files aren't following the naming convention we discussed of starting with a two letters and four digits.
What's NOAA's plan for this file, are you going to extract the images and stage them to MSU yourselves or were you going to try to submit the zip to FathomNet? Just FYI, we won't allow files that large; the zip file that is, the individual image sizes are fine. I haven't figured out a size limit yet, but accepting large file size opens us up to denial of service attacks.
Hi! We can divide into subdirectories. I haven't looked at it. I had a teammate move it for the testing with 3 outside to test the linking. Do you see any logical breaks for directory structures? I do know we're going to need to update the filenames as well (I noticed when I sent you the link). If not, what would you suggest or prefer regarding breaks. We can just break into manageable chunks (preferably under 10 g, definitely under 25).
@megancromwell I unpacked the zip file. It has subdirectories like JP_1
, through JP_24
, JRS_1
through JRS_41
, and one oddball JP_spherical1
. The amount of image data each directory contains varies between from 10s of MB through at least 2GB. Each directory contains black and white PNG images and a CSV file named the same as the directory, e.g. JRS_15.csv.
BTW, from FathomNet's point of view that's all fine. For submitting the data to FathomNet, you can host it, and remunge the CSV file to FathomNet's format as described at https://medium.com/fathomnet/how-to-submit-localized-image-annotations-to-fathomnet-baf25dbd8165. I would definitely add a column in the CSV of imagingtype
with a value of black and white
. I might be able to help you with that if needed.
sequenceDiagram
actor U as User
box FathomNet
participant F as FathomNet
participant A as Atlas
end
box NOAA
participant M as MSU
participant O as NOAA OER
end
rect rgb(1, 87, 155)
note right of U: Initial Upload to FathomNet
U->>+F: Upload ZIP of images + CSV
F->>A: Unzip and archive Images
F->>F: Register Images and Localizations
F-->>-U: Notify User
end
rect rgb(51, 105, 30)
note right of F: Temporary archive at MSU
F->>+M: Zip archived images and stage to MSU (how often?)
M->>M: Unzip images and provide web access
M-->-F: How will FathomNet get the new image URLs?
F->>F: Updated image URLs to MSU location
F->>A: Remove images that were staged to MSU
end
rect rgb(0, 96, 100)
note right of F: Final archive at NOAA OER
F->>F: Create CSV archive of database tables
F->>+M: Upload CSV archiveto MSU?
F->>M: Is the plan to upload ALL images to NOAA OER?
M->>O: Archive CSV
M->>O: Archive Images
O-->>F: How will FathomNet get the new image URLs?
F->>F: Update image URLs to OER location
end
@errol-ronje I have a small zip file of images to use as a test set. How do I get them archived to MSU's servers?
Brian, can you move it to this Google Drive folder for staging, then I will push it to MSU: https://drive.google.com/drive/folders/13n2Pwy87VdWDgJSdIfTkgsqMd3DOGAhx?usp=sharing
Errol Ronje Oceanographer Oceanographic and Geophysical Science and Services Division NOAA National Centers for Environmental Information (NCEI) Stennis Space Center, MS https://orcid.org/0000-0003-3312-5662
On Wed, Apr 19, 2023 at 1:38 PM Brian Schlining @.***> wrote:
@errol-ronje https://github.com/errol-ronje I have a small zip file of images to use as a test set. How do I get them archived to MSU's servers?
— Reply to this email directly, view it on GitHub https://github.com/fathomnet/community-feedback/issues/88#issuecomment-1515196838, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7ILVXTIO4E5NABUPPASJA3XCAWLHANCNFSM6AAAAAAUH4WHEA . You are receiving this because you were mentioned.Message ID: @.***>
@megancromwell @errol-ronje I've dropped a zip file named fathomnet.zip
in that Google Drive directory.
This part is completed. I've deployed bitfrost to repackage uploaded zip files and move them to a staging location (https://fathomnet.org/static/staging/). I'm moving on to new tasks to complete this pipeline.
Staging to MSU
Moving to NOAA
References