DIAGNijmegen / pathology-whole-slide-data

A package for working with whole-slide data including a fast batch iterator that can be used to train deep learning models.
https://diagnijmegen.github.io/pathology-whole-slide-data/
Apache License 2.0
86 stars 24 forks source link

Implementation of qupath annotation parser ('.geojson') #42

Closed tsikup closed 8 months ago

tsikup commented 8 months ago

Fixes #41

The changes and implementation are very straightforward. Qupath's annotation files, which are '.geojson' format, export a unique file structure that was not readable by the WholeSlideAnnotationParser class.

In addition, there is a chance that an annotation may be of type 'multipolygon', and we need to take that into consideration in the parse() method of the parent AnnotationParser class (treat it as multiple polygon annotations in a for loop).

I would be happy to discuss the pull request with you.

tsikup commented 8 months ago

If the multipolygon handling should be done within the _parse method of Qupath parser (in case other parsers always return polygons), please let me know and will fix it.

Nikos

martvanrijthoven commented 8 months ago

@tsikup the PR already looks very nice! Thanks!

The only comment that I have is indeed about the changes you made to the base parser. I understand why these changes are needed, but as you already suggest, other annotation files don't necessary have multipolygons. If you could implement that in the qupath parser (for me also fine if you completely override the "parse" method), then i can accept the PR.

tsikup commented 8 months ago

I implemented the multipolygon handling within qupath parser. However, I removed the following from the base parser: annotation["coordinates"] = np.array(annotation["coordinates"]), as it would conflict with the case where annotation["coordinates"] is a dict, which would include coordinates of holes.

I believe this is not specific to qupath parser, and also it is inline with the _get_geometry() method, which accepts coordinates as a dict. In any case, I don't think that coordinates should be a numpy array, even a list is acceptable to create a PolygonAnnotation or a shapely geometry polygon.

martvanrijthoven commented 8 months ago

Yes I agree, no reason that the coordinates should be a numpy array.

Thank you so much for this addition to the package, very valuable!