PathologyDataScience / HistomicsML2

A tool for training machine-learning models with whole-slide imaging datasets
22 stars 5 forks source link

Superpixel boundaries not showing in created+imported dataset #68

Closed Reasat closed 3 years ago

Reasat commented 3 years ago

I have some whole slide images scanned by leica scanner, i.e., .scn files. I am trying to create a database using these images and annotate them.

I have transformed the .scn files to .tif using the create_tiff.sh script (https://histomicsml2.readthedocs.io/en/latest/data-create.html). There, I just to changed the .svs file extension to .scn to make it work. Next, I followed the usual steps in the dataset creation and import steps. In my directory, I have all the boundary and centroid information files.

But in the UI only the image loads. When I click show segmentation, the superpixel boundaries do not show up. And in the superpixel selection step, double-clicking an area does not produce a patch. Also, no error gets thrown up.

image

Any suggestions on what might be the issue?

slee172 commented 3 years ago

@Reasat It seems like the boundary table isn't properly set on the database. Can you check the docker database and see if the boundary table is correctly set up? You can check "sregionboundaries" table in the database.

Reasat commented 3 years ago

Thanks for the hint. I am not sure how to check the table in the docker database. I thought the boundaries are saved in the boundaries folder in a .txt file. Where is the table and how do I check the "sregionboundaries" part?

Reasat commented 3 years ago

Actually, the problem seems to be an spatial offset between the superpixel boundaries and the actual image during plotting. The boundaries are plotted above the image. image

Since the shape of the superpixels and the overall boundary encompasses the actual structure of the tissue, I think the superpixels have been calculated accurately. But in the plotting phase something odd is happening.

For some reason, there are large white areas around the tif image which I think is throwing off the superpixels coordinates. image I am not sure if libvips is introducing the white areas or is it something to do with the built-in display functions. If I open the same image with Qupath, I don't see these white areas.

slee172 commented 3 years ago

@Reasat Very interesting. Basically, HistomicsML2 uses Openslide to open image flies like .svs formatted. I think it supports .scn format as well, but there might be some issues when converting the .scn formatted image to .tiff or when generating boundary files. I'm not sure where the issue comes from. Can you share a sample .scn format file? I think I have to research on the .scn file to figure it out.

Reasat commented 3 years ago

Sure, I am sending a link to a sample file to your gmail.

slee172 commented 3 years ago

@Reasat I found the problem but couldn't figure it out yet. First of all, your image (.scn format) was analyzed by a python package (large_image) in the docker dataset image. It reads exactly the slide area without the white space and process to extract superpixel boundaries, and then the boundary information is stored. So, the boundary extraction is correct. But, the problem is that "vips" reads the entire area and converts .scn formatted file to .tiff formatted file. This causes an issue with the incorrect points in the slide image. There would be two possible solutions to it. The first thing is to generate the boundary information on your local machine rather than using the docker image. This means that you can change the "large_image" version to another one since the python package has been updated so that you can extract the boundary information fitted to the .tiff formatted image. Another solution to the issue would be to find a way of generating .tiff formatted image without the white space. We are currently updating the data generation part to get more speed up but it might take some time. Just let me know if you further question.

Reasat commented 3 years ago

Thanks for the suggestions! I think excluding the whitespaces while converting to tiff seems logical. There is a function in vips im_extract_areabands which allows extraction of a rectangular portion. https://linux.die.net/man/3/im_extract_bands I think I can pass on the ROI and convert the are to tiff

But I need to understand the parameters going into the below line in create_tiff.sh file.

`vips im_extract_bands --vips-cache-trace --vips-progress $tile $outfilepath.svs.dzi.tif:jpeg:99,tile:256x256,pyramid,,,,8 0 3

'jpeg:99,tile:256x256,pyramid,,,,8 0 3', I understand that the image is going through a jpg compression before saving but what is the significance of the other parameters? i.e., 256, pyramid, the bunch of commas, 8, 0 and 3 etc. `

slee172 commented 3 years ago

This issue is no longer relevant. Will close and move to the new issue.