dchaley / qupath-project-initializer

Tools to initialize a QuPath project from input files
0 stars 0 forks source link

What to do with QuPath image paths? #5

Open dchaley opened 3 months ago

dchaley commented 3 months ago

The QuPath data files refer to file paths – local to wherever the QuPath script was run.

This means you can't just download the project & open it, at least I think not, not without fixing the paths.

1) can we do relative paths? 2) should we ask the user to specify a path rewrite, for when they download it? 3) QuPath says it has an "image server", can that be … cloud storage ...??? (It can do NAS but that's a big moving part)

One big question is: where does the end QuPath app even run? On somebody's computer? On a cloud VM?

dchaley commented 3 months ago

Hi @bnovotny ! Any thoughts on these questions?

dchaley commented 3 months ago

Note: if necessary, it looks like we could override the AbstractImageServer, we could surely wrap the one it actually uses (the ImageJ server??) and insert a download to local storage then proceed as usual. But we'd need to get this functionality into the QuPath distribution 😵

lynnlangit commented 3 months ago

QuPath can NOT use cloud storage currently. Would have to copy from cloud storage to working location. NAS is an anti-pattern (too expensive). Could we use associated PD to QuPath instance as a temporary, working storage area (like how Google Batch works)?

dchaley commented 3 months ago

That's how we're running the QuPath libraries (against locally downloaded files). So I don't see why we couldn't use a PD.

Is QuPath itself running on a cloud VM w/ GUI enabled?

dchaley commented 3 months ago

Consolidating our learnings.

1. We need absolute paths/URIs

QuPath does not seem to support relative paths, eg ../OMETIFF/MyImage.ome.tiff

The project needs absolute paths.

2. Non-filesystem paths?

QuPath uses an ImageServer to provide image data. When loading a URI QuPath iterates over the list of available providers and selects the ones that (line 190) support the output format (in our case a BufferedImage) and also (line 199) have a positive support level. The "preferred" one is (I think) the one with the greatest support level.

The now-archived qupath-chcapi-extension repo demonstrates a custom image server that reads from a DICOM store provided by the Cloud Healthcare API.

So, we could in principle establish a live connection to Google Storage to fetch pyramidal image data. At cloud scale, downloading even a few gigabytes is quite fast so it would probably be fine to just fetch it upfront then defer to regular file-system loaders.

This approach requires shipping a QuPath extension which, while not the end of the world, isn't something I think we need to do because of:

3. Project bundles

Thanks to Brenna's pointers we saw how QuPath deals with paths moving around. It's apparently a "well known issue" in the community as projects move between systems. Example issue: https://github.com/qupath/qupath/issues/266 ; links to blog post by QuPath creator.

Here's a video demonstrating the user experience: https://youtu.be/B_qpNkPzHzk

This suggests we could simply bundle up the project structure we're looking at:

📁 root
↳ 📁 OMETIFF
  ↳ 📄 SomeSampleImage.ome.tiff
↳ 📁 SEGMASKS
  ↳ 📄 SomeSampleImage_WholeCellMask.ome.tiff
↳ 📁 QUPATH
  ↳ 📄 project.qpproj

or really … any structure.

Then when users open the qpproj project definition, they'll be prompted to find the missing images and, following the Search process above, would just select the project root and press "OK". Well, hopefully.

Based on what I know–feedback please–I think option №3 (bundles) is best for our zero-to-one priorities.

The process output would be a qupath bundle that users download and use either on-premises or, on a VM. It would not be an authoritative live copy, it would be more like a "starter project".

One consequence is proliferating the potentially large image files. Not sure if this is a problem. Without doing №2 (live connection to storage) we're kinda stuck with downloading the files anyhow. (Could maaayyybbbeee have a persistent disk with the image files on them……?)

I propose to keep the issue open until decision, then close out & create follow-up issues based on approach chosen.