Closed hohonuuli closed 3 months ago
The steps for preprocessing might be:
[!NOTE] These steps require us to know the base URL of where images are hosted on MSU servers
FNYYMM
directoryimage
column to match the new image names and the expected location on MSU servers. [!WARNING] The directory name requested by NCEI is going to lead to naming collisions. This is an issue we need to resolve up front.
Proof of concept for renaming and repackaging a zip file.
#!/usr/bin/env -S scala-cli shebang
//> using scala "3.3.0"
//> using dep "com.github.pathikrit::better-files:3.9.2"
/*
Brian Schlining
2023-09-15
Usage:
PocNceiRepackage.sc <zipfile> <destination>
Example:
./PocNceiRepackage.sc /Users/brian/Desktop/fathomnet/demo.zip /Users/brian/Desktop/fathomnet/temp
It will create zip file named FNYYMM.zip in the destination directory. The zip file will contain a directory
named FNYYMM with all the renmamed images and a csv file with updated image names.
*/
import better.files.{File as BFile} // https://github.com/pathikrit/better-files
import java.nio.file.{Files, Path}
// java.nio.file.Path to better.files.File
given pathToBetterFile: Conversion[Path, BFile] = (p: Path) => BFile(p)
val format = java.time.format.DateTimeFormatter.ofPattern("yyMM")
val prefix = s"FN${format.format(java.time.LocalDate.now)}"
// unzip file
def unzip(zip: Path, destination: Path): Path =
zip.unzipTo(destination)
destination
// find csv file. images will be in the same directory
def findCsv(dir: Path): Path =
val csv: Iterator[BFile] = dir.glob("**/*.csv")
csv.next().path
// find images using csv directory
def findImages(imageDir: Path): List[Image] =
imageDir.glob("*.{jpg,png}").map(b => Image(b.path)).toList
// rename image
final case class Image(path: Path):
val name: String = path.name
val newName: String = s"${prefix}_$name"
def renameImages(images: Seq[Image], destination: Path): Unit =
destination.createIfNotExists()
images.foreach { i => i.path.moveTo(destination / i.newName) }
def updateCsv(csv: Path): String =
val regex = "[^,]*\\.(jpg|png)".r
val lines = csv.lines
val newLines = for
line <- lines
yield
regex.findFirstIn(line) match
case Some(imagename) => line.replace(imagename, s"${prefix}_$imagename")
case None => line
newLines.mkString("\n")
val zip = Path.of(args(0))
val destination = Path.of(args(1))
// Read the data we need
val csv = findCsv(unzip(zip, destination))
val images = findImages(csv.parent.path)
println("Found " + images.size + " images in " + csv.parent.path)
// move the images
val newDestination = destination / "root" / prefix
newDestination.createDirectoryIfNotExists()
renameImages(images, newDestination.path)
// update the csv
val updatedCsvData = updateCsv(csv)
val newCsv = newDestination / csv.name
newCsv.write(updatedCsvData)
// zip the files
val newZip = newDestination.parent.zipTo(destination / s"$prefix.zip")
newDestination.parent.delete()
When fathomnet receives an image set zip file, it will need some preprocessing so that the contents meet the naming requirements of NCEI. NCEI is requesting that:
FNYYMM
where YY is a two-digit year and MM is the two-digit month. So a set uploaded on September 14th would beFN0914
.myAwesomePic.jpg
would becomeFN0914_myAwesomePic.jpg
.