Jetraw / bioformats_jetraw

Jetraw plug-in for Fiji Bio-Formats
0 stars 0 forks source link

Enable the Jetraw decoder to transparently work with Bioformats without "patching" of bioformats libraries. #4

Open sanguinettib opened 2 years ago

sanguinettib commented 2 years ago

Currently, to "register" the Jetraw codec with Bio-Formats, one must replace the original formats-bsd-6.x.y.jar with a patched version. This is problematic for the following reasons:

  1. When updating Bio-Formats, to retain jetraw compatibility one must replace the formats-bsd-6.x.y.jar file.
  2. If one has a different software that uses Bio-Formats, the .jar has to be replaced on each
  3. As the filename is not distinguishable from the original, a user can't tell if the installed file is the original from Bio-Formats or the jetraw-patched version.

Through an interaction of the Bio-Formats and Jetraw authors, this could probably be resolved. These are the solutions that I can think of: a. The Jetraw decoder is made open source and included in Bio-Formats: currently not possible as it is the same code-base as the encoder (which is commercial). However a compiled version is available for free for all platforms. b. Bio-Formats re-distributes the Jetraw decoder in binary form: I believe this is currently not possible due to the open-source policy of Bio-Formats. I may be wrong. c. Write open-source code that runtime-links Bio-Formats to external binary codecs. Send a pull-request to Bio-Formats. A user may have to install these codecs manually, or there could be a popup suggesting automatic installation. This however would solve all the three problems mentioned above, and the code could allow integration with other binary codecs. d. Open to any other suggestions!

sbesson commented 2 years ago

Thanks for starting this discussion @br1000 , we have been discussing this issue internally and here is some feedback.

  1. As the filename is not distinguishable from the original, a user can't tell if the installed file is the original from Bio-Formats or the jetraw-patched version.

In addition to the filename, another issue is that the JAR manifest does not allow to distinguish between versions. Taking for instance the artifacts available from https://github.com/Jetraw/Bio-Formats/releases/tag/22.05.01.1, the manifest of the modified JARs is identical to the original JAR:

% unzip -qc formats-bsd-6.9.1.jar META-INF/MANIFEST.MF  
Manifest-Version: 1.0
Implementation-Title: BSD Bio-Formats readers and writers
Implementation-Version: 6.9.1
Built-By: arveq
Specification-Vendor: Open Microscopy Environment
Specification-Title: BSD Bio-Formats readers and writers
Implementation-Vendor-Id: ome
Class-Path: ome-common-6.0.7.jar minio-5.0.2.jar google-http-client-xm
 l-1.20.0.jar google-http-client-1.20.0.jar httpclient-4.0.1.jar httpc
 ore-4.0.1.jar xpp3-1.1.4c.jar okhttp-3.7.0.jar okio-1.12.0.jar jackso
 n-annotations-2.9.6.jar jackson-core-2.9.6.jar jackson-databind-2.9.6
 .jar joda-time-2.2.jar guava-29.0-jre.jar failureaccess-1.0.1.jar lis
 tenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar jsr305-3.
 0.2.jar checker-qual-2.11.1.jar error_prone_annotations-2.3.4.jar j2o
 bjc-annotations-1.3.jar logback-core-1.2.0.jar logback-classic-1.2.0.
 jar ome-xml-6.2.3.jar specification-6.2.3.jar formats-api-6.9.1.jar o
 me-codecs-0.3.1.jar ome-jai-0.1.0.jar turbojpeg-6.9.1.jar native-lib-
 loader-2.1.4.jar jgoodies-forms-1.7.2.jar jgoodies-common-1.7.0.jar k
 ryo-4.0.2.jar reflectasm-1.11.3.jar asm-5.0.4.jar minlog-1.3.0.jar ob
 jenesis-2.5.1.jar commons-lang-2.6.jar perf4j-0.9.16.jar slf4j-api-1.
 7.6.jar jhdf5-19.04.0.jar base-18.09.0.jar commons-io-2.7.jar commons
 -lang3-3.10.jar metadata-extractor-2.11.0.jar xmpcore-5.1.3.jar jxrli
 b-all-0.2.4.jar xercesImpl-2.8.1.jar xml-apis-1.3.03.jar serializer-2
 .7.2.jar xalan-2.7.2.jar
Implementation-Date: 28 abril 2022
Implementation-Vendor: Open Microscopy Environment
Implementation-Build: fe3331875499d5aa540416c89b2925559b0ccb7a
Created-By: Apache Maven 3.6.3
Build-Jdk: 1.8.0_281
Specification-Version: 6.9
Implementation-URL: https://www.openmicroscopy.org/bio-formats

This metadata does not reflect the fact that these binaries have been generated by a separate entity using a different revision of the source code. For end-users, the only reliable way to know whether their distribution uses a modified JAR is to compare its checksum with the original.

Looking at the different proposed solutions, a few comments

b. Bio-Formats re-distributes the Jetraw decoder in binary form: I believe this is currently not possible due to the open-source policy of Bio-Formats. I may be wrong.

Indeed, the GPL license allows to distribute binaries but has requirements regarding the availability of the source code - see for instance https://www.gnu.org/licenses/gpl-faq.en.html#ModifiedJustBinary

c. Write open-source code that runtime-links Bio-Formats to external binary codecs. Send a pull-request to Bio-Formats. A user may have to install these codecs manually, or there could be a popup suggesting automatic installation. This however would solve all the three problems mentioned above, and the code could allow integration with other binary codecs.

For this option, you might want to take a look at the Bio-Formats service infrastructure and the LuraWaveCodec which is an example of implementation making use of this mechanism. From our side, this feels like the best path forward to try first in order to address these outstanding issues.

Adding a final thought which is possibly outscope of this technical issue. The TIFF format is still widely used in the bioimaging community and remains compatible with many tools and applications and for instance we see OME-TIFF still has a strong adoption for many domains. With constantly evolving technologies and ever-growing data volumes, we fully understand the use case for developing new efficient compressions. However, we are not involved in the maintenance and the development of the TIFF specification which governance has been a concern over the last few years. In the context of this work, this raises a very serious question regarding whether the introduction of new TIFF codecs that could be validated and registered officially by the format owner. For users the primary danger is to generate data using unofficially supported variants and eventually put such data at risk in the long -term.

sanguinettib commented 2 years ago

Thank you @sbesson for having looked at this in detail, and for pointing us in the right direction. I propose to split the issue in 3 separate ones. On this one, we will start by trying solution (c), following your recommendation of looking at the Bio-Formats service infrastructure.

In the meantime, we will correct the manifest ( #5 ) and try to register Jetraw's TIFF compression tag value officially #6. Jetraw is very often used for long term data archival, and one of the primary design goals is to enable this, I just realised that a clear strategy for long-term data preservation is missing from the documentation, and have opened a new issue here: #7

arvequina commented 2 years ago

@sanguinettib the 2 commits that will be part of the official pull request to Bio-Formats repository can be found in here: https://github.com/Jetraw/bioformats/commits/develop_jetraw

Let me know your inputs, thanks.