fathomnet / community-feedback

1 stars 0 forks source link

Preprocessing of an image set after upload #139

Closed hohonuuli closed 3 months ago

hohonuuli commented 1 year ago

When fathomnet receives an image set zip file, it will need some preprocessing so that the contents meet the naming requirements of NCEI. NCEI is requesting that:

  1. The images are in directory named FNYYMM where YY is a two-digit year and MM is the two-digit month. So a set uploaded on September 14th would be FN0914.
  2. All images in the directory must start with the same prefix. So a user submitted image of myAwesomePic.jpg would become FN0914_myAwesomePic.jpg.

[!IMPORTANT] Much of the heavy lifting is in this issue.

hohonuuli commented 1 year ago

The steps for preprocessing might be:

[!NOTE] These steps require us to know the base URL of where images are hosted on MSU servers

  1. Unpack the zip to a temporary archive
  2. Create the FNYYMM directory
  3. Move/rename the files into that directory
  4. Rename the values in the image column to match the new image names and the expected location on MSU servers.
  5. Move the CSV into that directory
  6. Zip it back up
  7. Submit a copy of the CSV to FathomNet for image registration with it's DarwinCore metadata
hohonuuli commented 1 year ago

[!WARNING] The directory name requested by NCEI is going to lead to naming collisions. This is an issue we need to resolve up front.

hohonuuli commented 1 year ago

Proof of concept for renaming and repackaging a zip file.

PocNceiRepackage.sc

#!/usr/bin/env -S scala-cli shebang
//> using scala "3.3.0"
//> using dep  "com.github.pathikrit::better-files:3.9.2"

/*
Brian Schlining
2023-09-15

Usage:

PocNceiRepackage.sc <zipfile> <destination>

Example:

./PocNceiRepackage.sc /Users/brian/Desktop/fathomnet/demo.zip /Users/brian/Desktop/fathomnet/temp

It will create zip file named FNYYMM.zip in the destination directory. The zip file will contain a directory
named FNYYMM with all the renmamed images and a csv file with updated image names.

*/

import better.files.{File as BFile} // https://github.com/pathikrit/better-files
import java.nio.file.{Files, Path}

// java.nio.file.Path to better.files.File
given pathToBetterFile: Conversion[Path, BFile] = (p: Path) => BFile(p)

val format = java.time.format.DateTimeFormatter.ofPattern("yyMM")
val prefix = s"FN${format.format(java.time.LocalDate.now)}"

// unzip file
def unzip(zip: Path, destination: Path): Path = 
  zip.unzipTo(destination)
  destination

// find csv file. images will be in the same directory
def findCsv(dir: Path): Path = 
  val csv: Iterator[BFile] = dir.glob("**/*.csv")
  csv.next().path

// find images using csv directory
def findImages(imageDir: Path): List[Image] = 
  imageDir.glob("*.{jpg,png}").map(b => Image(b.path)).toList

// rename image
final case class Image(path: Path):
  val name: String = path.name
  val newName: String = s"${prefix}_$name"

def renameImages(images: Seq[Image], destination: Path): Unit = 
  destination.createIfNotExists()
  images.foreach { i => i.path.moveTo(destination / i.newName) }

def updateCsv(csv: Path): String = 
  val regex = "[^,]*\\.(jpg|png)".r
  val lines = csv.lines
  val newLines = for 
    line <- lines
  yield 
    regex.findFirstIn(line) match
      case Some(imagename) => line.replace(imagename, s"${prefix}_$imagename")
      case None => line
  newLines.mkString("\n")

val zip = Path.of(args(0))
val destination = Path.of(args(1))

// Read the data we need
val csv = findCsv(unzip(zip, destination))
val images = findImages(csv.parent.path)
println("Found " + images.size + " images in " + csv.parent.path)

// move the images
val newDestination = destination / "root" / prefix
newDestination.createDirectoryIfNotExists()
renameImages(images, newDestination.path)

// update the csv
val updatedCsvData = updateCsv(csv)
val newCsv = newDestination / csv.name
newCsv.write(updatedCsvData)

// zip the files
val newZip = newDestination.parent.zipTo(destination / s"$prefix.zip")
newDestination.parent.delete()

PocNceiRepackage.sc.zip

demo.zip

hohonuuli commented 3 months ago

https://github.com/fathomnet/fathomnet-support/pull/6