Open shntnu opened 3 years ago
Sorry about that. The ENTRYPOINT does work when I'm testing just your docker but we don't use it directly in our distributed workflow because we make another docker that FROMs your docker.
We've now updated the point where we call Bioformats2Raw in our docker to reflect your new ENTRYPOINT.
(I was accidentally looking at the glencoesoftware/bioformats2raw Dockerfile not the ome/bioformats2raw-docker Dockerfile, but I see the change when I look at the right place :)
@ErinWeisbart
When you get to it, please create the NGFFs at s3://cellpainting-gallery/cpg0004-lincs
instead of s3://cellpainting-gallery/lincs
We will eventually move all of s3://cellpainting-gallery/lincs
to s3://cellpainting-gallery/cpg0004-lincs
We will eventually move all of
s3://cellpainting-gallery/lincs
tos3://cellpainting-gallery/cpg0004-lincs
I have done this
I am now deleting lincs
aws s3 rm --profile jump-cp-role --recursive --quiet s3://cellpainting-gallery/lincs/
cc @ErinWeisbart
😱
I think Distributed-BioFormats2Raw is working!
My test plate uses --target-min-size 2160
as we discussed above.
Before I pull the trigger on 100 plates, I wanted to confirm:
Distributed-BioFormats2Raw is FROM'ing openmicroscopy/bioformats2raw:0.5.0rc1. There is no reason not to proceed with that particular release candidate, correct?
These outputs are all as expected, yes?
du -sh SQ00014813__2016-05-23T19_03_28-Measurement1
(original folder)
151G
du -sh SQ00014813.ome.zarr
151G
ome_zarr info SQ00014813.ome.zarr
D:\SQ00014813.ome.zarr [zgroup]
- metadata
- Plate
- data
- (1, 5, 1, 34560, 51840)
ome_zarr info SQ00014813.ome.zarr\A\1\0
D:\SQ00014813.ome.zarr\A\1\0 [zgroup]
- metadata
- Multiscales
- data
- (1, 5, 1, 2160, 2160)
The .ome.zarr will open with napari-ome-zarr
though it takes a long time and prints this warning many times before loading:
c:\users\administrator\appdata\local\programs\python\python38\lib\site-packages\napari\_vispy\layers\image.py:231: UserWarning: data shape (34560, 51840) exceeds GL_MAX_TEXTURE_SIZE 8192 in at least one axis and will be downsampled. Rendering is currently in 2D mode.!
@joshmoore does this seem like we are good to go from your perspective?
The .ome.zarr will open with napari-ome-zarr though it takes a long time and prints this warning many times before loading:
c:\users\administrator\appdata\local\programs\python\python38\lib\site-packages\napari\_vispy\layers\image.py:231: UserWarning: data shape (34560, 51840) exceeds GL_MAX_TEXTURE_SIZE 8192 in at least one axis and will be downsampled. Rendering is currently in 2D mode.!
Coming back quickly to this warning, do you remembers which napari command was used? napari
will fetch the data for the first image across all wells, stitch them into a grid using the plate layout. I think the message is expected as per the decision to keep only the full resolution of each well, it is not possible to retrieve a low-resolution version of each well so the stitched plate representation need to be downsampled for visualization purposes.
@sbesson Thanks for the explanation. That makes sense.
I get the same warning using napari {PLATE}.ome.zarr
or just opening napari with napari
and drag-and-drop the .ome.zarr folder into the view window.
Conversion is almost done. 130/136 plates present in cellpainting-gallery converted without error and are available at s3://cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/
Note that I had these four plates on a batch list (I don't remember where I pulled the list from), but they didn't have images in the gallery so they were not converted to .ome.zarr's. SQ000152252016-10-29T16_09_17-Measurement1 SQ000152262016-10-29T17_50_20-Measurement1 SQ000152272016-10-29T19_31_37-Measurement1 SQ000152282016-10-29T21_13_50-Measurement1
These six plates had errors in their logs. They all created .ome.zarr files of the appropriate size (all were 150.5 to 150.7 GB, 166675 total objects). SQ000151682016-04-21T05_12_27-Measurement1 SQ000152162016-05-04T17_08_56-Measurement1 SQ000151622016-04-15T08_49_32-Measurement1 SQ000151162016-04-13T16_31_49-Measurement1 SQ000151212016-04-14T00_54_07-Measurement1 SQ000150982016-06-08T18_43_42-Measurement1
Details/Debug:
Note that if a task fails our script is set to retry but it all logs in the same place, so I am only copying down the first Exception
/s can find.
SQ00015168__2016-04-21T05_12_27-Measurement1
Does not open in napari.
ome_zarr info 15168.ome.zarr
doesn't return anything.
ome_zarr info 15168.ome.zarr\A\1\0
as expected.
nano 15168.ome.zarr\.zattrs
returns only
{
"bioformats2raw.layout" : 3
}
CloudWatch log:
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@54c5a2ff): java.io.FileNotFoundException: /home/ubuntu/local/SQ00015168__2016-04-21T05_12_27-Measurement1/Images/Index.idx.xml (Unknown error 4094)
I can also see an out of space error shortly thereafter. The behavior of this plate is consistent with previous tests where I didn't have a large enough hard drive. However, it's not clear to me why that would happen as it started at 151.5GB and 17283 objects (whereas the plate next to it SQ00015167 started at 150.7GB and 17283 objects) and the EBS volume I used was 500 GB.
Caused by: java.io.IOException: No space left on device
The other five plates have similar behavior in CloudWatch logs. The two I dug into seemed to be fully functional:
SQ00015216__2016-05-04T17_08_56-Measurement1
Opens in napari. ome_zarr info 15216.ome.zarr
, ome_zarr info 15216.ome.zarr\A\1\0
and nano 15216.ome.zarr\.zattrs
as expected.
Cloudwatch:
java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015216__2016-05-04T17_08_56-Measurement1.ome.zarr/J/22/6/0/0/4/0/1/2: Structure needs cleaning
SQ00015162__2016-04-15T08_49_32-Measurement1 Cloudwatch:
java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015162__2016-04-15T08_49_32-Measurement1.ome.zarr/C/8/4: Structure needs cleaning
SQ00015116__2016-04-13T16_31_49-Measurement1 Cloudwatch:
Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015116__2016-04-13T16_31_49-Measurement1.ome.zarr/E/5/4: Structure needs cleaning
SQ00015121__2016-04-14T00_54_07-Measurement1 Cloudwatch:
java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015121__2016-04-14T00_54_07-Measurement1.ome.zarr/B/23/8/0/0/1/0/0/2: Structure needs cleaning
SQ00015098__2016-06-08T18_43_42-Measurement1
Opens in napari. ome_zarr info 15216.ome.zarr
, ome_zarr info 15216.ome.zarr\A\1\0
and nano 15216.ome.zarr\.zattrs
as expected.
Cloudwatch:
java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/C/3/7/0/0/0/0/2: Structure needs cleaning
So exciting!
Note that I had these four plates on a batch list (I don't remember where I pulled the list from), but they didn't have images in the gallery so they were not converted to .ome.zarr's.
I confirm that you can ignore those plates - they still appear in some lists, but those plates don't actually exist (see my notes in https://github.com/broadinstitute/lincs-cell-painting/issues/54#issuecomment-942670917 if you are curious)
SQ00015168__2016-04-21T05_12_27-Measurement1
I double-checked if the Index.idx.xml
existed, just in case that was the problem; it does. So it's something else 🤷
The other five plates have similar behavior in CloudWatch logs. The two I dug into seemed to be fully functional:
This one seems to be a system-level error
I am hoping others might be able to comment on this.
An easy tack is to just run it again and see if that does it (no need to delete the existing files IIUC, unless the convertor creates temp files)
I missed this bit: "The two I dug into seemed to be fully functional". Hmm in that case, I wonder if there's some way of checking the consistency of NGFF files because that would be the simplest way of guaranteeing that everything actually went well despite the warnings? Of course, no warnings is ideal, so perhaps just repeating those 5 is in fact the simplest solution here.
I re-converted SQ00015168__2016-04-21T05_12_27-Measurement1
using a 600GB drive instead of a 500GB drive and it is successfully converted. (I don't actually know if it's the re-run itself that fixed it or the fact that the re-run used a larger drive).
Regardless, I believe all the plates have been made into .ome.zarrs and are ready for IDR.
I would imagine part of the IDR workflow is to make sure that the plates all load in IDR, but we should pay extra attention to confirming that the 5 plates that were throwing the structure cleaning
error do in fact play nicely with IDR.
Super!
Over to y'all @sbesson @joshmoore
Thanks, pulling the data now from s3://cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/
. Gonna try to import it into one of our test systems, will keep you posted.
Super!
By “import” you don’t actually mean copying it over to your system, correct? That is, the files will live on S3 as they are and will simply be accessed by IDR in some manner, right?
Of course it’s perfectly ok to copy it over, it’s just that I wanted to make sure that we’re able to test the key aspect of this setup, which is to have the files stored on and accessed from S3.
Hi @shntnu. For the remote access we will need to deploy new infrastructure, but in the meantime we can test a local copy.
Apologies for the delay here - @dominikl and I have been off recently - Now I'm back and trying to catch up... Dom said he downloaded the 6 plates mentioned above and tried to import them:
2022-07-27 14:35:44,947 3432581 [ main] ERROR ome.formats.importer.cli.ErrorHandler - FILE_EXCEPTION: /nfs/bioimage/drop/idr0125-way-cellpainting/s3-20220725/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/A/1/0/0/0/0/0/1/2
java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap')
at loci.formats.in.ZarrReader.parsePlate(ZarrReader.java:539)
at loci.formats.in.ZarrReader.initFile(ZarrReader.java:287)
In response to @shntnu "I wonder if there's some way of checking the consistency of NGFF files", there is a tool for validating the JSON against a schema: E.g. https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0056B/7361.zarr
As you can see, the URL to the plate is in the URL source=...
, which is also the URL that you can open the same plate in napari,
E.g. $ napari --plugin napari-ome-zarr -vvv https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0056B/7361.zarr
Is your data publicly available in the same way? Thanks, Will.
Thanks for the update @will-moore
Our data is indeed publicly available.
Regarding the error: The validator seems to approve the wells, but it errors in the same way as you've reported.
Unfortunately, I have zero knowledge on what to do next, and I suspect the same for @ErinWeisbart, so hopefully, you / folks at your end know what to do next :D?
For a different plate, it didn't even like the JSON
Thanks @will-moore. Did you and/or @dominikl try any of the rest of the batch that converted without any errors?
Did you see them same behavior for all 6 plates? i.e. Since the initial error I re-converted SQ00015168__2016-04-21T05_12_27-Measurement1
and it didn't error on re-conversion so I would hope/expect it works now.
Thanks all for the progress. Coming back on the issue of the invalid plates
Regarding the error: The validator seems to approve the wells, but it errors in the same way as you've reported. https://ome.github.io/ome-ngff-validator/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/
The root of the issue is that the type of plate.acquisitions.id
field was ambiguous in the original OME-NGFF 0.4 specification. We worked to clarify the attribute type expectation in the specification and add a JSON schema used by the validator (https://github.com/glencoesoftware/bioformats2raw/pull/152) and updated the converter to comply with this requirement (https://github.com/glencoesoftware/bioformats2raw/pull/152).
Looking at the timestamps, I realise this change was merged and released a few days after our conversation above and only made it into bioformats2raw 0.5.0rc2
while you used bioformats2raw 0.5.0rc1
.
This is very unfortunate timing but on the flip side, the only files/objects that needs to be fixed are the top-level .zattrs
defining the plate
metadata where the id
needs to be converted into an integer with a diff similar to:
"acquisitions" : [ {
- "id" : "0"
+ "id" : 0
} ]
A full reconversion would fix this issue in a brute force approach but this feels completely unnecessary. I would instead propose to try and patch these top-level .zattrs
files individually so that they pass the online validation and we can continue the validation. Is that doable from a permissions perspective on your side? Assuming yes, do you have a preferred workflow and/or constraints for e.g. executing an update script against a series of S3 locations?
I checked the .zattrs
for all 136 plates and only 63 of them contained the "plate":{...
metadata:
It seems that this metadata didn't get exported for the other plates.
But even for plates without the plates
metadata, the Wells and Images seem to have exported OK, e.g. https://ome.github.io/ome-ngff-validator/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00014812__2016-05-23T20_44_31-Measurement1.ome.zarr/P/24/
and they look good in vizarr: https://hms-dbmi.github.io/vizarr/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00014812__2016-05-23T20_44_31-Measurement1.ome.zarr/P/24/
It seems that the .zattrs
for every plate is identical apart from the name
, so as above, it would be nice to fix this issue without repeating a full export, simply by duplicating the .zattrs
for plates where it's missing, updating the name
accordingly?
@sbesson
Thanks for diagnosing the issue!
@will-moore our messages crossed; I paste mine in here for completeness
I just noticed that the .zattrs
are different across different plates (n=63 are 31518 bytes, and n=75 are just 33 bytes)
S3LOC=s3://cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr
parallel "echo -n {}:; aws s3 ls ${S3LOC}/{}/.zattr" ::: `aws s3 ls ${S3LOC}/|tr -s " "|cut -d" " -f3|tr -d "/"` > ~/Desktop/lincs.txt
The ones with 33 bytes look like this
{
"bioformats2raw.layout" : 3
}
The ones with 31518 bytes look like this
SQ00014813__2016-05-23T19_03_28-Measurement1.ome.zarr.zattrs.txt
It seems that the
.zattrs
for every plate is identical apart from thename
, so as above, it would be nice to fix this issue without repeating a full export, simply by duplicating the.zattrs
for plates where it's missing, updating thename
accordingly?
This sounds like a reasonable fix
I compared two at random, and indeed name
is the only key that differs
diff SQ00015214__2016-05-04T23_48_48-Measurement1.ome.zarr.zattrs SQ00015217__2016-05-04T22_08_57-Measurement1.ome.zarr.zattrs
53c53
< "name" : "SQ00015214",
---
> "name" : "SQ00015217",
But I wonder if this is an indication of some bigger problem with the conversion process? If you're not worried about that, then I @ErinWeisbart can proceed to make this fix; LMK @sbesson @will-moore
Thanks for the deep dive @will-moore and @sbesson. I have full write permissions in the bucket so I can easily make the necessary changes.
Can you confirm that I'm understanding correctly that I should take any one of the complete .ome.zarr.zattrs
files, confirm that "acquisitions" : [ {"id" : 0}]
(int not string), and then use it to replace all of the .ome.zarr.zattrs
files in the batch, changing "plate":{"name" : "SQ00014813"}
to match?
But I wonder if this is an indication of some bigger problem with the conversion process? If you're not worried about that, then ~I~ @ErinWeisbart can proceed to make this fix; LMK @sbesson @will-moore
Specifically it will be great if you can confirm that bioformats2raw 0.5.0rc2
would have indeed created the metadata for the 75 files for which bioformats2raw 0.5.0rc1
did not @sbesson
But I wonder if this is an indication of some bigger problem with the conversion process?
I have exactly the same concern @shntnu. The writing of the plate metadata is effectively one of the final steps in the conversion process. Similarly to Will's comment above, my minimal tests seem to indicate the metadata is present in randomly chosen well samples on a plate. So in the less disruptive scenario, only the top-level plate metadata failed to write.
Did you happen to store logs of your distributed conversion for each worker? If yes, that might give us hints on where the conversion stopped/failed. If not, I assume our best way forward is to use the IDR validation process to determine whether fixing the top-level metadata is sufficient to address the problem.
Specifically it will be great if you can confirm that bioformats2raw 0.5.0rc2 would have indeed created the metadata for the 75 files for which bioformats2raw 0.5.0rc1 did not @sbesson
I would expect the plate metadata to be present in the top-level keys of all converted datasets independently of the version of bioformats2raw
.
Can you confirm that I'm understanding correctly that I should take any one of the complete .ome.zarr.zattrs files, confirm that "acquisitions" : [ {"id" : 0}] (int not string), and then use it to replace all of the .ome.zarr.zattrs files in the batch, changing "plate":{"name" : "SQ00014813"} to match?
If all your plates have the same HCS layout i.e. same number of rows/columns and same number of populated wells, the plate.name
value might be the only metadata field that would vary between plates. If this assumption holds, that read like a sensible workflow that would allow us to test whether we can simply fix the plate
metadata. Should we try with one of the 75 faulty plates ?
Logs coming from the Bioformats2Raw docker for passing plate (SQ00014813):
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp12767929091814070502/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.`
2022-07-05 20:22:13,158 [main] WARN o.x.m.e.h.AcquisitionModeEnumHandler - Unknown AcquisitionMode value 'NonConfocal' will be stored as "Other"
(Lots of copies of this warning, then)
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/opt/conda/share/bioformats2raw-0.5.0rc1-0/lib/kryo-4.0.2.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Logs coming from the Bioformats2Raw docker for failing plate (SQ00014812): Logs look the same as above, however, after successful completion and upload it looks like the job was accidentally pulled again and this time had the following errors and then overwrote the first successful copy:
2022-07-06 03:23:22,017 [pool-1-thread-4] ERROR c.g.bioformats2raw.Converter - Failure processing chunk; resolution=0 plane=1 xx=0 yy=1024 zz=0 width=1024 height=1024 depth=1
java.nio.file.FileSystemException: /home/ubuntu/local/SQ00014812__2016-05-23T20_44_31-Measurement1.ome.zarr/C/21/4/0/0/1/0/1: Structure needs cleaning
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389)
at java.base/java.nio.file.Files.createDirectory(Files.java:689)
at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:796)
at java.base/java.nio.file.Files.createDirectories(Files.java:782)
at com.bc.zarr.storage.FileSystemStore$1.close(FileSystemStore.java:85)
at com.bc.zarr.chunk.ChunkReaderWriterImpl_Short.write(ChunkReaderWriterImpl_Short.java:84)
at com.bc.zarr.ZarrArray.write(ZarrArray.java:235)
at com.glencoesoftware.bioformats2raw.Converter.writeBytes(Converter.java:907)
at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1138)
at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:1334)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
(repeats a number of times with different pool-n-thread-n
and file)
Logs coming from the Bioformats2Raw docker for failing plate (SQ00015140):
Similar logs as passing plate but upload logs a lot of warnings
warning: Skipping file /home/ubuntu/local/SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr/G/18/8. File does not exist
And then
2022-07-06T04:22:42.201-07:00 2022-07-06 11:22:42,191 [main] ERROR c.g.bioformats2raw.Converter - Error while writing series 196
2022-07-06T04:22:42.211-07:00 java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr/A/22/7: Structure needs cleaning
2022-07-06T04:22:42.211-07:00 at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
2022-07-06T04:22:42.211-07:00 at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
2022-07-06T04:22:42.211-07:00 at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
2022-07-06T04:22:42.211-07:00 at java.base/sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:389)
2022-07-06T04:22:42.211-07:00 at java.base/java.nio.file.Files.createDirectory(Files.java:689)
2022-07-06T04:22:42.211-07:00 at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:796)
2022-07-06T04:22:42.211-07:00 at java.base/java.nio.file.Files.createDirectories(Files.java:782)
2022-07-06T04:22:42.211-07:00 at com.bc.zarr.storage.FileSystemStore$1.close(FileSystemStore.java:85)
2022-07-06T04:22:42.211-07:00 at java.base/sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:341)
2022-07-06T04:22:42.211-07:00 at java.base/sun.nio.cs.StreamEncoder.close(StreamEncoder.java:161)
2022-07-06T04:22:42.211-07:00 at java.base/java.io.OutputStreamWriter.close(OutputStreamWriter.java:258)
2022-07-06T04:22:42.211-07:00 at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator.close(WriterBasedJsonGenerator.java:997)
2022-07-06T04:22:42.211-07:00 at com.fasterxml.jackson.databind.ObjectWriter._writeValueAndClose(ObjectWriter.java:1222)
2022-07-06T04:22:42.211-07:00 at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:1059)
2022-07-06T04:22:42.211-07:00 at com.bc.zarr.ZarrUtils.toJson(ZarrUtils.java:68)
2022-07-06T04:22:42.212-07:00 at com.bc.zarr.ZarrGroup.createHeader(ZarrGroup.java:207)
2022-07-06T04:22:42.212-07:00 at com.bc.zarr.ZarrGroup.create(ZarrGroup.java:78)
2022-07-06T04:22:42.212-07:00 at com.bc.zarr.ZarrGroup.create(ZarrGroup.java:69)
2022-07-06T04:22:42.212-07:00 at com.bc.zarr.ZarrGroup.create(ZarrGroup.java:65)
2022-07-06T04:22:42.212-07:00 at com.glencoesoftware.bioformats2raw.Converter.setSeriesLevelMetadata(Converter.java:1682)
2022-07-06T04:22:42.212-07:00 at com.glencoesoftware.bioformats2raw.Converter.saveResolutions(Converter.java:1217)
2022-07-06T04:22:42.212-07:00 at com.glencoesoftware.bioformats2raw.Converter.write(Converter.java:713)
2022-07-06T04:22:42.212-07:00 at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:660)
2022-07-06T04:22:42.212-07:00 at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:489)
2022-07-06T04:22:42.212-07:00 at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:97)
2022-07-06T04:22:42.212-07:00 at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine.access$1300(CommandLine.java:145)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine.call(CommandLine.java:2761)
2022-07-06T04:22:42.213-07:00 at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2002)
2022-07-06T04:22:42.213-07:00 Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@54c5a2ff): java.lang.RuntimeException: java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr/A/22/7: Structure needs cleaning
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
2022-07-06T04:22:42.213-07:00 at picocli.CommandLine.access$1300(CommandLine.java:145)
2022-07-06T04:22:42.214-07:00 at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
2022-07-06T04:22:42.214-07:00 at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
2022-07-06T04:22:42.214-07:00 at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
2022-07-06T04:22:42.214-07:00 at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
2022-07-06T04:22:42.214-07:00 at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
2022-07-06T04:22:42.214-07:00 at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
2022-07-06T04:22:42.214-07:00 at picocli.CommandLine.call(CommandLine.java:2761)
2022-07-06T04:22:42.214-07:00 at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2002)
2022-07-06T04:22:42.215-07:00 Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: /home/ubuntu/local/SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr/A/22/7: Structure needs cleaning
2022-07-06T04:22:42.215-07:00 at com.glencoesoftware.bioformats2raw.Converter.unwrapException(Converter.java:1745)
2022-07-06T04:22:42.215-07:00 at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:664)
2022-07-06T04:22:42.215-07:00 at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:489)
2022-07-06T04:22:42.215-07:00 at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:97)
2022-07-06T04:22:42.215-07:00 at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
2022-07-06T04:22:42.215-07:00 ... 9 more
I can dig through more logs, but I'm now wondering if things are getting messy from jobs accidentally being run multiple times and upload is occurring while the job is in progress a second time.
If I'm remembering correctly bioformats2raw
doesn't create any sort of done file or return an exit message upon completion, right? If not, what would be (one of) the very last files to be made so I can use the creation of that as my done condition to help keep this from happening?
I'm wondering if this explains the almost-empty .ome.zarr.zattrs
.
Do you know offhand if the .ome.zarr.zattrs
is created in the beginning but then fully populated at the end?
Thanks @ErinWeisbart for the logs, we'll take a closer look tomorrow but off-hand the Structure needs cleaning
exception is not particulary reassuring as it usually points at errors at the file-system level, corruption in the worst scenario.
Answering two of the later questions:
If I'm remembering correctly bioformats2raw doesn't create any sort of done file or return an exit message upon completion, right?
No output file but I believe the command should return a zero exit code on success and non-zero on exception:
(conversion) [sbesson@pilot-zarr1-dev ~]$ bioformats2raw test.fake test.ome.zarr && echo success
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp18240546533731183048/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
success
(conversion) [sbesson@pilot-zarr1-dev ~]$ bioformats2raw test.fake test.ome.zarr && echo success
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp14302411850609288330/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@53fe15ff): java.lang.IllegalArgumentException: Output path test.ome.zarr already exists
at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
at picocli.CommandLine.call(CommandLine.java:2761)
at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2002)
Caused by: java.lang.IllegalArgumentException: Output path test.ome.zarr already exists
at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:474)
at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:97)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
... 9 more
If not, what would be (one of) the very last files to be made so I can use the creation of that as my done condition to help keep this from happening? Do you know offhand if the .ome.zarr.zattrs is created in the beginning but then fully populated at the end?
In your case, the top-level .zattrs
will be created early in the process (with the bioformats2raw.layout
version) but it will only be updated at the end with the plate
metadata. So the presence of a plate
key would likely be a good indicator that the conversion process ran to completion.
Thanks @sbesson. I will update our script to look for a plate
key in the .zattrs to signify completion.
I have now used SQ00014813__2016-05-23T19_03_28-Measurement1.ome.zarr
as a template to replace the .zattrs for all the plates.
A spot check that was failing for Shantanu before is now now passing. https://ome.github.io/ome-ngff-validator/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00015098__2016-06-08T18_43_42-Measurement1.ome.zarr/
```python import os import boto3 from botocore import UNSIGNED from botocore.config import Config import json import subprocess import re root = 's3://cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/' localroot = '/Users/eweisbar/Desktop/testzarr' if not os.path.exists(localroot): os.makedirs(localroot, exist_ok=True) # Make master zattrs omezarr = 'SQ00014813__2016-05-23T19_03_28-Measurement1.ome.zarr' cmd = f'aws s3 cp --no-sign-request {os.path.join(root,omezarr,".zattrs")} {os.path.join(localroot,omezarr,".zattrs")}' subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT) f = open(os.path.join(localroot,omezarr,".zattrs")) master_zattrs = json.load(f) master_zattrs['plate']['acquisitions'][0]['id'] = 0 filelist = [] cmd = f'aws s3 ls --no-sign-request --human-readable {root}' with subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT) as p: for line in p.stdout: line = line.decode() filelist.append(line.rsplit(' ', 1)[1].split('\n')[0]) for omezarr in filelist: plate = omezarr.split('__')[0] newzattrs = master_zattrs.copy() newzattrs['plate']['name'] = plate json_object = json.dumps(newzattrs, indent=4) with open(os.path.join(localroot,'newzattrs.json'), "w") as outfile: outfile.write(json_object) cmd = f'aws s3 cp --profile jump-cp-role-jump-cellpainting --acl bucket-owner-full-control --metadata-directive REPLACE {os.path.join(localroot,"newzattrs.json")} {os.path.join(root,omezarr,".zattrs")}' with subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT) as p: for line in p.stdout: print (line, end='\n') ```
Thanks @ErinWeisbart for the logs, we'll take a closer look tomorrow but off-hand the
Structure needs cleaning
exception is not particulary reassuring as it usually points at errors at the file-system level, corruption in the worst scenario.
@sbesson did you have any further thoughts on this? Specifically, should anything be re-run here, given these concerns?
Erin's fix in https://github.com/broadinstitute/lincs-cell-painting/issues/54#issuecomment-1211146063 is (IIUC) guaranteed to fix the metadata issue, but it's possible that the underlying data itself is corrupted (for the reasons you mentioned), but I don't know how to check for that.
Important note: I expect our paper, which cites this dataset on IDR, to go live ~Sep 15. We're eager to do what's needed to get it up on IDR by then; do let us know how we can help
Hi @shntnu, we have re-sync'd the fixes to a couple of sample plates and I tried to re-import to IDR. I thought it had failed but apparently I was using a test-server that hadn't been updated with the ZarrReader support. Apologies for the delay. I'm repeating the import now...
@DavidStirling this is what we were talking about at SBI2 - FYI
For our notes, the IDR team will proceed with ingesting just Batch1 for now such that we can have this up on IDR soon (vs. waiting for Batch2).
This will allow us to meet Cell Systems' requirement:
At this stage, please ensure that if you have data or code in a repository, its embargo/release date is set to coincide with the publication of your paper.
(We've asked Cell Systems for their ETA and will report back here)
From @will-moore:
While some of the technical issues we've mentioned are being worked on, I though I'd start to look at the NGFF plates on the cellpainting s3, downloading the metadata only (everything apart from the chunks).
These metadata-only plates can be imported into a test-IDR server and allow us to work on the annotation of the data, without having to worry about the storage issues.
Josh ran a python script that went through all the plates and checked the first Well and the first Image in that Well.
This identified the following plates as missing at least their first Well:
SQ00015096__2016-06-08T17_05_23-Measurement1.ome.zarr/
SQ00015097__2016-06-08T15_26_27-Measurement1.ome.zarr/
SQ00015151__2016-06-09T05_07_35-Measurement1.ome.zarr/
SQ00015204__2016-04-24T20_02_01-Measurement2.ome.zarr/
SQ00015205__2016-04-24T02_21_50-Measurement1.ome.zarr/
SQ00015212__2016-04-23T19_01_00-Measurement1.ome.zarr/
SQ00015229__2016-05-13T08_10_01-Measurement1.ome.zarr/
SQ00015232__2016-05-12T07_24_32-Measurement1.ome.zarr/
This can also be seen in ome-ngff-validator by following the links in https://idr.github.io/idr0125-way-cellpainting/ E.g. https://ome.github.io/ome-ngff-validator/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00015096__2016-06-08T17_05_23-Measurement1.ome.zarr
Jean-Marie also happened to find other plates with some Wells missing. E.g. https://ome.github.io/ome-ngff-validator/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr/
Once I've downloaded the metadata for all the plates, I'll be able to get a more complete listing of missing Wells, but this is just a heads-up that some plates will need re-generation or at least re-upload.
From https://github.com/broadinstitute/lincs-cell-painting/issues/54#issuecomment-1225860686
Thanks @ErinWeisbart for the logs, we'll take a closer look tomorrow but off-hand the
Structure needs cleaning
exception is not particulary reassuring as it usually points at errors at the file-system level, corruption in the worst scenario.@sbesson did you have any further thoughts on this? Specifically, should anything be re-run here, given these concerns?
Erin's fix in #54 (comment) is (IIUC) guaranteed to fix the metadata issue, but it's possible that the underlying data itself is corrupted (for the reasons you mentioned), but I don't know how to check for that.
What you're observing might not be surprising in light of ^^^
heads-up that some plates will need re-generation or at least re-upload.
That sounds right, and it is most likely the plates that had the Structure needs cleaning
error.
Once I've downloaded the metadata for all the plates, I'll be able to get a more complete listing of missing Well
We will wait for this to be certain what needs to be rerun
I used a script at https://github.com/IDR/idr0125-way-cellpainting/blob/main/scripts/s3_sync.py to generate cli aws commands for downloading metadata-only plates (all files except the chunks) and then did a simple count of the number of .zarray
files in each. I expected to see 3456 (9 * 384) in each plate. The following plates had a lower count:
SQ00015096__2016-06-08T17_05_23-Measurement1.ome.zarr 239
SQ00015097__2016-06-08T15_26_27-Measurement1.ome.zarr 365
SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr 2620
SQ00015151__2016-06-09T05_07_35-Measurement1.ome.zarr 489
SQ00015160__2016-04-15T03_50_42-Measurement1.ome.zarr 456
SQ00015204__2016-04-24T20_02_01-Measurement2.ome.zarr 864
SQ00015205__2016-04-24T02_21_50-Measurement1.ome.zarr 817
SQ00015212__2016-04-23T19_01_00-Measurement1.ome.zarr 2160
SQ00015229__2016-05-13T08_10_01-Measurement1.ome.zarr 1103
SQ00015232__2016-05-12T07_24_32-Measurement1.ome.zarr 55
The 5 plates that threw structure needs cleaning
errors do not overlap with the 10 plates that failed Will's check detailed in the comment above.
I can certainly re-run this list of 10 which can help us diagnose whether the failure was stochastic or represents an underlying fault in the data. @will-moore I'm a bit unclear whether this list is comprehensive (meaning all other plates are passing your quality checks) or whether there will be more checks and therefore potentially more faulty .ome.zarrs to re-create. Can you let me know explicitly when we have what you would consider a final list of failures? Thanks!
All the other plates contained the expected 3456 .zarray
files, so I expect they should be complete (although that's the only quality check I've done so far).
I'll start to import plates into a test server soon and that should provide better validation, but it will take some time.
OK, so the import into an IDR test server of all the 136 metadata-only plates (no chunks) failed for 48 plates:
SQ00014814__2016-05-23T17_24_56-Measurement1.ome.zarr
SQ00014816__2016-05-23T14_07_55-Measurement1.ome.zarr
SQ00014817__2016-05-23T12_08_18-Measurement1.ome.zarr
SQ00014820__2016-05-25T21_16_25-Measurement1.ome.zarr
SQ00015041__2016-05-25T19_39_25-Measurement1.ome.zarr
SQ00015042__2016-05-25T18_02_08-Measurement1.ome.zarr
SQ00015050__2016-05-24T06_38_27-Measurement1.ome.zarr
SQ00015056__2016-05-20T16_27_19-Measurement1.ome.zarr
SQ00015096__2016-06-08T17_05_23-Measurement1.ome.zarr
SQ00015097__2016-06-08T15_26_27-Measurement1.ome.zarr
SQ00015099__2016-06-08T13_48_06-Measurement1.ome.zarr
SQ00015100__2016-05-16T22_01_19-Measurement1.ome.zarr
SQ00015102__2016-05-16T18_45_00-Measurement1.ome.zarr
SQ00015116__2016-04-13T16_31_49-Measurement1.ome.zarr
SQ00015117__2016-04-13T18_12_45-Measurement1.ome.zarr
SQ00015118__2016-04-13T19_52_28-Measurement1.ome.zarr
SQ00015119__2016-04-13T21_33_42-Measurement1.ome.zarr
SQ00015126__2016-03-25T14_15_31-Measurement1.ome.zarr
SQ00015128__2016-04-14T21_08_47-Measurement1.ome.zarr
SQ00015129__2016-04-15T05_30_25-Measurement1.ome.zarr
SQ00015130__2016-04-01T13_10_04-Measurement1.ome.zarr
SQ00015136__2016-06-11T21_14_31-Measurement1.ome.zarr
SQ00015139__2016-06-11T16_21_04-Measurement1.ome.zarr
SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr
SQ00015141__2016-06-09T08_43_58-Measurement1.ome.zarr
SQ00015143__2016-05-19T21_25_23-Measurement1.ome.zarr
SQ00015144__2016-05-19T19_47_38-Measurement1.ome.zarr
SQ00015145__2016-05-19T18_10_43-Measurement1.ome.zarr
SQ00015151__2016-06-09T05_07_35-Measurement1.ome.zarr
SQ00015153__2016-06-09T01_50_39-Measurement1.ome.zarr
SQ00015154__2016-04-14T16_07_53-Measurement1.ome.zarr
SQ00015155__2016-04-14T17_48_56-Measurement1.ome.zarr
SQ00015158__2016-04-15T00_30_51-Measurement1.ome.zarr
SQ00015159__2016-04-15T02_10_36-Measurement1.ome.zarr
SQ00015160__2016-04-15T03_50_42-Measurement1.ome.zarr
SQ00015162__2016-04-15T08_49_32-Measurement1.ome.zarr
SQ00015163__2016-04-20T21_00_28-Measurement1.ome.zarr
SQ00015196__2016-04-24T23_20_24-Measurement2.ome.zarr
SQ00015198__2016-04-24T15_10_23-Measurement1.ome.zarr
SQ00015200__2016-04-01T08_46_01-Measurement1.ome.zarr
SQ00015204__2016-04-24T20_02_01-Measurement2.ome.zarr
SQ00015205__2016-04-24T02_21_50-Measurement1.ome.zarr
SQ00015212__2016-04-23T19_01_00-Measurement1.ome.zarr
SQ00015218__2016-05-04T08_36_42-Measurement1.ome.zarr
SQ00015222__2016-05-06T05_09_24-Measurement1.ome.zarr
SQ00015223__2016-05-06T03_29_13-Measurement1.ome.zarr
SQ00015229__2016-05-13T08_10_01-Measurement1.ome.zarr
SQ00015232__2016-05-12T07_24_32-Measurement1.ome.zarr
The import error doesn't say which file from the plate was responsible for the failure.
I've checked a few of these plates in the ome-ngff-validator
, but that currently only checks the first Image in each Well.
So I have opened a PR to allow checking of ALL wells in each plate.
This isn't merged yet, but you can use a preview build to test.
E.g. for the first plate in the list...using &well=all
gives https://deploy-preview-14--ome-ngff-validator.netlify.app/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00014814__2016-05-23T17_24_56-Measurement1.ome.zarr/&well=all
This shows 1 well has invalid Images. Clicking on the Well shows the Invalid images. Clicking them gives you the problematic files. E.g. https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/SQ00014814__2016-05-23T17_24_56-Measurement1.ome.zarr/D/20/1/.zattrs
Although this approach can be used to find the missing files for each plate, it's a bit laborious to do that for all 48 plates, but I don't know a better way.
It also occurs to me that since we're importing without chunks (and then adding the chunks via symlinking them in later) we don't have any way to validate missing chunks. That would require a fair bit more checking: 45 chunks per image (9 per XY plane x 5 channels).
I'm ready to re-create the 48 plates that @will-moore identified above. I'm planning on uploading them to a new folder instead of overwriting so that we can directly compare old and new files (and then we can replace old files if the new pass QC). I'd like to use the official bioformats2raw0.5.0 release. @joshmoore (or someone else?) are you able to push it to dockerhub? The latest docker there is 0.5.0rc1.
(I have updated the DOZC done condition, increased the job length time to prevent partial re-processing of plates, and will use the official 0.5.0 release. Hopefully these combined with produce errorless .ome.zarr's)
The conda package has been deployed and the docker image updated:
$ docker run --rm openmicroscopy/bioformats2raw:0.5.0 --version
Version = 0.5.0
Bio-Formats version = 6.10.1
NGFF specification version = 0.4
0.5.0: digest: sha256:1ed9b56f133bdb89142d01f5c1e1d430d41aab63b29f6274b6b0c207c674ec77 size: 2819
Thanks @joshmoore !
I've regenerated .ome.zarr's for the 48 plates that @will-moore flagged using the 0.5.0 docker. They are in a separate folder at s3://cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr_050/
I used the preview of the all-wells check in the ome-ngff-validator
on the remake of a plate that was failing before and it passes now! (I also checked SQ00015232__2016-05-12T07_24_32-Measurement1.ome.zarr, the last plate in the list, and it also passes)
https://deploy-preview-14--ome-ngff-validator.netlify.app/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr_050/SQ00014814__2016-05-23T17_24_56-Measurement1.ome.zarr/&well=all
Thanks @ErinWeisbart - I've downloaded the metadata for those plates (everything but the chunks) and the import into an IDR test server is running... I'll let you know how it goes...
Hi @ErinWeisbart - The import of all those 48 plates was successful! So we just need to get all the plates in a public bucket (88 valid original plates and the 48 '050' plates) - I'll refer back to @sbesson who is discussing this elsewhere...
@sbesson said:
To move forward with the next steps of the validation, we will need to access the binary data associated with these plates. How do you want to proceed there e.g. will you copy the remaining plates to the public cellpainting-gallery-backend AWS S3 bucket?
@ErinWeisbart and I will ponder and loop back here
An issue we will need to overcome is that this (private) https://github.com/jump-cellpainting/cellpainting-gallery-config/pull/42#issuecomment-1313622111 is no longer working. More in a day or so
@sbesson @will-moore
We have changed permissions so that s3://cellpainting-gallery/
behaves the same as s3://cellpainting-gallery-backend/
! 🎉
i.e., this works:
aws s3 ls --no-sign-request s3://cellpainting-gallery/|head -2
PRE cpg0000-jump-pilot/
PRE cpg0001-cellpainting-protocol/
@ErinWeisbart – over to you to decide what happens next
aws s3 ls s3://cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr_050/|wc -l
# 48
aws s3 ls s3://cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr/|wc -l
# 136
Er – sorry – still some issues with this, so I am reverting the permissions. We will continue debugging and report back here @sbesson
I was missing a flag :D All set!
We have changed permissions so that s3://cellpainting-gallery/ behaves the same as s3://cellpainting-gallery-backend/! 🎉
Thanks for the heads up @shntnu and very exciting. We have been able to mount the cellpainting-gallery
bucket without issue with the new permissions
[sbesson@pilot-idr0125-omeroreadwrite ~]$ ls /cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr_050/
SQ00014814__2016-05-23T17_24_56-Measurement1.ome.zarr
SQ00014816__2016-05-23T14_07_55-Measurement1.ome.zarr
SQ00014817__2016-05-23T12_08_18-Measurement1.ome.zarr
SQ00014820__2016-05-25T21_16_25-Measurement1.ome.zarr
SQ00015041__2016-05-25T19_39_25-Measurement1.ome.zarr
SQ00015042__2016-05-25T18_02_08-Measurement1.ome.zarr
SQ00015050__2016-05-24T06_38_27-Measurement1.ome.zarr
SQ00015056__2016-05-20T16_27_19-Measurement1.ome.zarr
SQ00015096__2016-06-08T17_05_23-Measurement1.ome.zarr
SQ00015097__2016-06-08T15_26_27-Measurement1.ome.zarr
SQ00015099__2016-06-08T13_48_06-Measurement1.ome.zarr
SQ00015100__2016-05-16T22_01_19-Measurement1.ome.zarr
SQ00015102__2016-05-16T18_45_00-Measurement1.ome.zarr
SQ00015116__2016-04-13T16_31_49-Measurement1.ome.zarr
SQ00015117__2016-04-13T18_12_45-Measurement1.ome.zarr
SQ00015118__2016-04-13T19_52_28-Measurement1.ome.zarr
SQ00015119__2016-04-13T21_33_42-Measurement1.ome.zarr
SQ00015126__2016-03-25T14_15_31-Measurement1.ome.zarr
SQ00015128__2016-04-14T21_08_47-Measurement1.ome.zarr
SQ00015129__2016-04-15T05_30_25-Measurement1.ome.zarr
SQ00015130__2016-04-01T13_10_04-Measurement1.ome.zarr
SQ00015136__2016-06-11T21_14_31-Measurement1.ome.zarr
SQ00015139__2016-06-11T16_21_04-Measurement1.ome.zarr
SQ00015140__2016-06-11T14_43_11-Measurement1.ome.zarr
SQ00015141__2016-06-09T08_43_58-Measurement1.ome.zarr
SQ00015143__2016-05-19T21_25_23-Measurement1.ome.zarr
SQ00015144__2016-05-19T19_47_38-Measurement1.ome.zarr
SQ00015145__2016-05-19T18_10_43-Measurement1.ome.zarr
SQ00015151__2016-06-09T05_07_35-Measurement1.ome.zarr
SQ00015153__2016-06-09T01_50_39-Measurement1.ome.zarr
SQ00015154__2016-04-14T16_07_53-Measurement1.ome.zarr
SQ00015155__2016-04-14T17_48_56-Measurement1.ome.zarr
SQ00015158__2016-04-15T00_30_51-Measurement1.ome.zarr
SQ00015159__2016-04-15T02_10_36-Measurement1.ome.zarr
SQ00015160__2016-04-15T03_50_42-Measurement1.ome.zarr
SQ00015162__2016-04-15T08_49_32-Measurement1.ome.zarr
SQ00015163__2016-04-20T21_00_28-Measurement1.ome.zarr
SQ00015196__2016-04-24T23_20_24-Measurement2.ome.zarr
SQ00015198__2016-04-24T15_10_23-Measurement1.ome.zarr
SQ00015200__2016-04-01T08_46_01-Measurement1.ome.zarr
SQ00015204__2016-04-24T20_02_01-Measurement2.ome.zarr
SQ00015205__2016-04-24T02_21_50-Measurement1.ome.zarr
SQ00015212__2016-04-23T19_01_00-Measurement1.ome.zarr
SQ00015218__2016-05-04T08_36_42-Measurement1.ome.zarr
SQ00015222__2016-05-06T05_09_24-Measurement1.ome.zarr
SQ00015223__2016-05-06T03_29_13-Measurement1.ome.zarr
SQ00015229__2016-05-13T08_10_01-Measurement1.ome.zarr
SQ00015232__2016-05-12T07_24_32-Measurement1.ome.zarr
This means we can start access the data remotely and move forward with the validation of the raw imaging data.
@ErinWeisbart – over to you to decide what happens next
i.e. are you going considering consolidating the converted plates under a single prefix on the bucket? And/or re-converting the first plates to use the same bioformats2raw version? Either way, it should not be a problem on our side, just let us know if the layout is modified so that we can adjust on our side.
@sbesson Eventually, I will consolidate into a single prefix (in the original location). Am I correct in understanding that there is a possibility that the original conversions may still fail QC once you move from metadata to chunks? If this is true, it would be easiest for me to wait to consolidate until we are sure both the original and new conversions fully pass QC, if this is okay with you.
We will upload image files to the Image Data Resource and add URL and metadata information to the Broad Bioimage Benchmark Collection.
We will use this issue to outline the required steps.
From IDR:
All files should be in tab-delimited text format. Templates are provided but can be modified to suit your experiment. Add or remove columns from the templates as necessary.
@gwaygenomics Did you have a processed data file for cell health?