choosehappy / HistoQC

HistoQC is an open-source quality control tool for digital pathology slides
BSD 3-Clause Clear License
255 stars 101 forks source link

Add support for bioformats? #142

Closed choosehappy closed 1 day ago

choosehappy commented 4 years ago

Should look into extending support to bioformats given the end-of-life of openslide

https://pythonhosted.org/python-bioformats/

Issues will be that the dependencies skyrocket as bioformats is java, and thus will require a JVM

Computation time will likely go up as well, limiting throughput

jjhbw commented 4 years ago

Out of interest, I have indeed noticed that OpenSlide is not really maintained anymore. Have you encountered any issues with that, though? Do the slide formats that it covers change often enough that the lack of continued maintenance of OpenSlide is an issue?

Bioformats also looks like an amazing lib, but its dependence on the JVM is a worry. Most of the slide-related tooling is hard enough to set up as it is. I must admit i'm not familiar with the latest generation of JVM tooling that may make it more portable to set up, so it may have improved since i last touched the ecosystem.

choosehappy commented 4 years ago

thanks for your question

openslide does have its limitations, but the older file formats (svs, ndpi, big-tiff) have remained the same and we haven't encountered any issues.

the problem is really in the newer file formats, e.g., philips, some of the newer 3dhistec versions, and of course the less popular scanners are not supported and unfortunately won't be supported by openslide unless someone picks up the reins

ultimately, updating openslide is extremely challenging, since many of the file formats are proprietary, and previously had to be reverse engineered for openslide, not an easy or forgiving task

we've also considered bioformats, but your concerns are warranted. it is quite "heavy" dependency wise, the integration with python is complex, and the overall computation time to read the slides is slower than that of openslide. we're certainly interested in offering that as an option if openslide is not able to read a slide, but it would involve quite a bit of effort to support both simultaneously.

keep in mind as well, that most researchers like ourselves, have A LOT of legacy code built up over the last decades which use older libraries (e.g., openslide), so without strong motivation to redevelop the wheel with a new library, there will be some blindspots in implementations.

hopefully as digital pathology becomes more popular, standards will be introduced, and common file formats will be created, and then a singular high-speed library will be created for reading them,but i fear that time is still quite far off

On Wed, Jun 10, 2020 at 11:08 AM Jurriaan BW notifications@github.com wrote:

Out of interest, I have indeed noticed that OpenSlide is not really maintained anymore. Have you encountered any issues with that, though? Do the slide formats that it covers change often enough that the lack of continued maintenance of OpenSlide is an issue?

Bioformats also looks like an amazing lib, but JVM is a worry. Most of the slide-related tooling is hard enough to set up as it is.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/choosehappy/HistoQC/issues/142#issuecomment-641863654, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3XTADI572RKBPC2MO4XDRV5ESTANCNFSM4JMBY4TQ .

jjhbw commented 4 years ago

Interesting. I'm new to digital pathology and am trying to get an idea of the state of the ecosystem, so your comment is very useful. Thanks!