dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
8 stars 2 forks source link

Create job that runs qupath creation script on a dataset root #293

Closed dchaley closed 4 weeks ago

dchaley commented 1 month ago

QuPath projects get initialized from a set of input tiffs. It's phase 3 in the below list of steps to create a project.

Input

A dataset directory, /path/to/my/dataset/

which contains a collection of TIFF files, $dataset/OMETIFFS/

Phase 1: convert TIFF to NPZ (channel extraction)

Take all the TIFF files, and create corresponding numpy files (.npz), with nuclear and membrane channels filled in.

Place in: $dataset/NPZ_INTERMEDIATE

This process could be different for each TIFF: different indexes into the tiff for each marker, as well as which channels to combine (aka sum).

Note that we may need to scale the intensities. Does this mean renormalizing to 0..1 ? ← probably!!

Phase 2: segmentation mask prediction

Take all the npz files from $dataset/NPZ_INTERMEDIATE

and generate corresponding DeepCell predictions: $dataset/SEGMASK/

Phase 3: QuPath Project initialization

See groovy script: createNewProject.groovy See also more scripts: MyCodexPipeline

This runs the script which generates metadata & project config.

The project creation script above is run as-is except for the input/output parameters at the top.

This job depends on #292 , use batch dependencies: [docs]

dchaley commented 1 month ago

Update: we were able to partially create a QuPath project using this script. It generated a thumbnail in the data folder and various other files. 🎉 cc @bnovotny

Buuuuuuut……… 😩 it doesn't actually load in QuPath due to this exception:

com.google.gson.JsonParseException: Cannot deserialize interface qupath.lib.images.servers.ImageServerBuilder$ServerBuilder because there is no field named builderType
java.io.IOException: com.google.gson.JsonParseException: Cannot deserialize interface qupath.lib.images.servers.ImageServerBuilder$ServerBuilder because there is no field named builderType

We found this pointer suggesting to create a new QP() object. That created more dependency issues and we were at time.

We needed to grab these JAR files so far:

gson-2.11.0.jar
ij-1.54f.jar
javacpp-1.5.10.jar
javafx-base-22.0.1.jar
javafx-controls-22.0.1.jar
javafx-graphics-22.0.1.jar
javafx-swt.jar
javafx.base.jar
javafx.controls.jar
javafx.fxml.jar
javafx.graphics.jar
javafx.media.jar
javafx.swing.jar
javafx.web.jar
jts-core-1.19.0.jar
openblas-0.3.26-1.5.10.jar
opencv-4.9.0-1.5.10.jar
qupath-core-0.5.0.jar
qupath-core-processing-0.5.0.jar
qupath-fxtras-0.1.3.jar
qupath-gui-fx-0.5.0.jar

Next time we'll re-tack slightly and try to set up a proper JVM project and use maven/gradle/something to manage dependencies for us. Hopefully if we specify the base qupath libraries, it'll "just work" and fetch dependencies.

dchaley commented 1 month ago

Weihao and I created a proper gradle project to help wrangle dependencies: https://github.com/dchaley/qupath-project-initializer

We were able to make progress, however it looks like some libraries aren't available / don't work properly with Apple silicon aarch64.

Using the above gradle project ^^ on a cloud shell instance,

I was able to "kind of" create a qupath project

05:18:18.674 [main] [INFO ] qupath.lib.scripting.QP - Initializing type adapters
/home/dchaley/tmp/qupath-project/OMETIFFs/Xenium_FFPE_Human_Breast_Cancer_Rep1_if_image.ome.tiff
05:18:19.862 [main] [WARN ] q.l.i.s.b.BioFormatsServerOptions - Bio-Formats memoization is enabled, but may not be supported (unknown Java or Bio-Formats version)
05:18:20.090 [main] [INFO ] q.l.i.s.b.BioFormatsServerOptions - Setting max Bio-Formats readers to 4
Adding: /home/dchaley/tmp/qupath-project/OMETIFFs/Xenium_FFPE_Human_Breast_Cancer_Rep1_if_image.ome.tiff05:18:21.146 [main] [INFO ] qupath.lib.io.PathIO - Writing object hierarchy with 0 object(s)...
05:18:21.150 [main] [INFO ] qupath.lib.io.PathIO - Image data written in 0.03 seconds
05:18:21.296 [main] [INFO ] qupath.lib.gui.prefs.PathPrefs - Setting default Locale to en_US
05:18:21.297 [main] [INFO ] qupath.lib.gui.prefs.PathPrefs - Setting Locale for FORMAT to en_US
05:18:21.298 [main] [INFO ] qupath.lib.gui.prefs.PathPrefs - Setting Locale for DISPLAY to en_US
05:18:21.324 [main] [INFO ] qupath.lib.common.ThreadTools - Setting parallelism to 3
Discovering Mask Files...
 >>> Xenium_FFPE_Human_Breast_Cancer_Rep1_if_image
FOUND NUC MASK: null
 >>> MISSING MASK FILES!! <<<

Done.

It assumes both whole-cell plus nuclear masks are present.

bnovotny commented 1 month ago

Oops, I totally forgot about the missing nuclear mask issue! I had to modify the script to use only the whole cell masks when I was testing. Just uploaded the script here: https://github.com/VillasboasLab/QuPath_Utility_Scripts/blob/main/IMC_Liftover/create_qupath_project_onlywholecell.groovy

This one is modified from the MyCodexPipeline repo, so it might look a little different from the one you are currently working with.

Sorry for the confusion @dchaley, I hope this one is more helpful! Let me know if there is anything you need me to try.

dchaley commented 1 month ago

We need a container set up to run the third phase: dchaley/qupath-project-initializer#9

Phase 2 (generate segmasks from NPZ files through DeepCell) is ready to test, see #292

dchaley commented 1 month ago

We have a container now: https://github.com/dchaley/qupath-project-initializer/issues/9 And it takes cmdline arguments: https://github.com/dchaley/qupath-project-initializer/issues/4

Now we need to run this from input files in the cloud. The "obvious" first problem is: the QuPath libraries only work on local.

Some options: (1) add QuPath plugin that encapsulates gs:// URIs by downloading to temporary storage??? (2) add another job that preps a persistent disk? mounted to compute & exposed to the container? (3) update our qupath initializer code to understand cloud storage … and move it to local … then move it to cloud storage when done??

Too bad gs-fastcopy is python not kotlin, ha, ha, ha. (Make JVM version???)

dchaley commented 1 month ago

Solving the remote storage problem here: https://github.com/dchaley/qupath-project-initializer/milestone/1

This issue is blocked on that milestone.

dchaley commented 4 weeks ago

This is now complete: we are able to download remote files to temporary working directories, do our QuPath work, then upload the resulting project files to remote storage.

Now we need to rebuild the container, and complete #294 : enqueuing a batch job.