dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
7 stars 2 forks source link

Support tiff visualization output #253

Open dchaley opened 1 week ago

dchaley commented 1 week ago

If the output visualization is a tiff instead of png, use tiff writing.

tifffile.imwrite(outfile, output)

Note that we need to support this in benchmark.py and the individual steps version.

It's probably worth refactoring these b/c they really are quite similar.

dchaley commented 5 days ago

@bnovotny does it matter how the TIFF is structured? For example, if it's RGB 0..255, is that ok, or does it need to be 0..1 in two channels, or …

We currently output RGB, and it's easy to just output the same image as a tiff as well.

bnovotny commented 4 days ago

Hi David, the output TIFF is just for the segmentation mask, so the format is basically a matrix representing the pixels, where pixels within the same segment all have the same number. It's a different output than the predictions.png, which is the predictions overlaid on the original image. If I am understanding the current outputs correctly, the tiff should be the same matrix as predictions.npz, just written out in tiff form. Let me know if that makes sense!

dchaley commented 4 days ago

Thanks @bnovotny ! OK, I have a PR #263 that adds this. However, it doesn't do it exactly the way DeepCell currently does…

AFAICT, that DeepCell run_app script outputs a tiff with 4 axes: batch, row, col, channels (1 or 2).

These scripts drop the batch axis, as they operate on 1 at a time (and rely on Batch for, uh, batches (pun not intended)). So it's just row, col, channels.

I noticed DeepCell has a further parameter squeeze which drops 1-length axes from the array.

The three formats look like so:

In [1]: import tifffile

In [2]: import numpy as np

In [3]: tiff_batched = tifffile.imread('/Users/davidhaley/tmp/predictions-batch.tiff')

In [4]: tiff_notsqueezed = tifffile.imread('/Users/davidhaley/tmp/predictions-notsqueezed.tiff')

In [5]: tiff_squeezed = tifffile.imread('/Users/davidhaley/tmp/predictions-squeezed.tiff')

In [6]: tiff_batched.shape
Out[6]: (1, 512, 512, 1)

In [7]: tiff_notsqueezed.shape
Out[7]: (512, 512, 1)

In [8]: tiff_squeezed.shape
Out[8]: (512, 512)

In [9]:

With my implementer's hat on (not user's), my preference is to only ever output the not-squeezed version: row, col, channel. Where the channel axis is of length 1 or 2. (Selfishly, I'd rather not, as it introduces more options across several parts– see also the 2nd commit of the PR #263)

But I have no idea how these files integrate up/downstream 😁

As a user of the system what is your preference? πŸ™

bnovotny commented 3 days ago

Thanks so much for looking into this! I did some digging as I am not the foremost expert... looks like the squeeze was added to deepcell for compatibility with a package that we do not currently use (deepcell-applications #16).

I have checked some masks from my existing projects, and I see some tiffs with and some without the channel dimension, so I'm fairly confident that not squeezing will be just fine.

If it helps for context of the downstream integration, we read the TIFFs into QuPath via ImageJ as seen here.

Much appreciated!

dchaley commented 3 days ago

Awesome, thanks @bnovotny ! I'll take a look through that code. I actually tried loading my file in Qupath but it rejected it. Maybe I need to install ImageJ?

Also, if I just send the tiffs can you tell if they work in the pipeline? I'll attach them shortly when I'm on my computer.

bnovotny commented 3 days ago

Hmm, I do think you'll need the ImageJ plugin to load the segmentations. It may also need the original fluorescence images to be loaded into QuPath already? Not 100% sure of the fix without seeing it.

Either way, if you send me the tiffs, I'm happy to take a look! Also if you have the original images from your test dataset, I can try making a QuPath project with them. Or if you update the docker container I can try it out on our data if that's easier. Thanks a bunch!

dchaley commented 3 days ago

Ah, yeah, I am probably not using QuPath right. This is me: fa5

Let's try it in your env: first of all, here's the data I'm working with, and how the pipeline processes it into results. https://drive.google.com/drive/folders/1PwjRZJUCVdqR9dAoiK8pP9D9GpjyKRGs

I have the 3 tiff file structures in there. The one that I think fits the pipeline structure is -batch. The one I selfishly prefer is -notsqueezed.

I've also merged the PR, so you can try it directly which is probably the fastest test!

It will output it WITHOUT the initial batch parameter. So, the shape will be rows, cols, channels (with just 1 channel).

To enable it you need to add --output_tiff to the run-job.py json. (Alongside --visualize_input, --visualize_predictions, and the rest) It'll output predictions.tiff to your output_path

The container is currently building & pushing, it should be done in <15min. You can follow along here: https://github.com/dchaley/deepcell-imaging/actions/runs/9783497133/job/27012240021 (I think it's a public link)

Thanks @bnovotny 😁

bnovotny commented 3 days ago

Ahaha, that is one of my favorite memes, I strongly identify with that dog πŸ˜… No worries! Thanks for sending the files; I will see if I can get them to load into QuPath.

In the meantime, I pushed the new container to GCP and tried running with the --output_tiff option, but I am getting an error "io.UnsupportedOperation: seek":

image

I've never seen this before... let me know if it seems like something I need to fix on my end. For the record, the same container runs fine if I don't add --output_tiff. Thanks so much @dchaley !

dchaley commented 3 days ago

Oh, that's fun. It's because tifffile is using file-seeking whereas, the smart_open library seems to only support writing linearly forward for GCP storage. Fixed in #264, the container is building. Sorry for not testing on cloud before @bnovotny πŸ˜…

dchaley commented 3 days ago

Output squeezing removed in #265. Thanks for confirming we don't need it πŸ‘πŸ» As the PR says, we still need to answer whether or not we need the batch dimension (which we should find out with the now-hopefully-working --output_tiff version)