IDR / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment (particularly UW-Madison LOCI and Glencoe Software). Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
http://www.openmicroscopy.org/site/products/bio-formats
GNU General Public License v2.0
0 stars 0 forks source link

Operetta improvements #28

Closed sbesson closed 2 years ago

sbesson commented 2 years ago

Discovered during the validation of idr0136, the OperettaReader currently shipped in the IDR fork of Bio-Formats adds unexpected files and more critically unexpected folders outside the plate directory to the getUsedFiles() output. This causes downstream issues e.g. when importing into OMERO.

Fortunately, this issue had been identified and recently fixed in https://github.com/ome/bioformats/pull/3784. This PR backports the relevant change from this PR as well as the parsing improvements from https://github.com/ome/bioformats/pull/3770 and https://github.com/ome/bioformats/pull/3775.

The upstream Bio-Formats reader includes other improvements notably a better support of sparse plates but porting these changes requires to update more components and eventually a regeneraion of the Bio-Formats cache files in IDR. This is possibly a valuable discussion to have at the IDR meeting but it will require some proper resourcing notably in terms of testing and some thoughts around the timeline. Immediately, this PR proposes the minimal set of changes allowing to import the plates and make progress with the validation of the submission.

sbesson commented 2 years ago

@dominikl the JARs from this branch have now been deployed under /opt/omero/server/OMERO.server/lib/server and /opt/omero/server/OMERO.server/lib/client under pilot-idr0136. omero import -f on a sample plate shows only files contained within the plate folder are listed.

Leaving you to follow up with the validation of the screen and we can decided whether we are happy with the patch as it is or if we need to consider a large Bio-Formats upgrade and/or a dataset conversion.

dominikl commented 2 years ago

Thanks Seb. Looks good, the import of the plates which caused the issue works now. There are some gray wells which look a bit strange, but that might be an issue with the plate itself. Screenshot 2022-06-16 at 08 06 19

sbesson commented 2 years ago

Thanks @dominikl, this is the import of the same plate into our nightly CI servers i.e. with all the Operetta improvements available from Bio-Formats 6.10.0

Screenshot 2022-06-16 at 16 07 18

The layout matches the metadata in the Index.id.xml

    <Plate>
      <PlateID>EO_20190325_ColocBMKd3mC3XLRep2</PlateID>
      <MeasurementID>843f309f-a8cb-4f6f-930e-8fb851340720</MeasurementID>
      <MeasurementStartTime>2019-03-25T22:51:47.4247688-04:00</MeasurementStartTime>
      <Name>EO_20190325_ColocBMKd3mC3XLRep2</Name>
      <PlateTypeName>384 PerkinElmer CellCarrier Ultra</PlateTypeName>
      <PlateRows>16</PlateRows>
      <PlateColumns>24</PlateColumns>
      <Well id="0812" />
      <Well id="0813" />
      <Well id="0814" />
      <Well id="0815" />
      <Well id="0816" />
      <Well id="0817" />
      <Well id="0818" />
      <Well id="0819" />
      <Well id="0820" />
      <Well id="0821" />
      <Well id="0822" />
      <Well id="0823" />
      <Well id="0912" />
      <Well id="0913" />
      <Well id="0914" />
      <Well id="0915" />
      <Well id="0916" />
      <Well id="0917" />
      <Well id="0918" />
      <Well id="0920" />
      <Well id="0921" />
      <Well id="1013" />
      <Well id="1014" />
      <Well id="1015" />
      <Well id="1016" />
      <Well id="1017" />
      <Well id="1018" />
      <Well id="1019" />
      <Well id="1020" />
      <Well id="1113" />
      <Well id="1114" />
      <Well id="1115" />
      <Well id="1116" />
      <Well id="1117" />
      <Well id="1118" />
      <Well id="1119" />
      <Well id="1120" />
      <Well id="1213" />
      <Well id="1214" />
      <Well id="1215" />
      <Well id="1216" />
      <Well id="1217" />
      <Well id="1218" />
      <Well id="1219" />
      <Well id="1313" />
      <Well id="1314" />
      <Well id="1315" />
      <Well id="1316" />
      <Well id="1317" />
      <Well id="1318" />
    </Plate>

This demonstrates that the cherry-picking of individual commits is insufficient and we need to look either into a bigger Bio-Formats upgrade and/or into a conversion.