Open dominikl opened 1 year ago
Looks like I had made some progress on this conversion in the same way as https://github.com/IDR/idr-metadata/issues/649 using the following script
import os
sourcedir="/uod/idr/filesets/idr0035-caie-drugresponse/images"
targetdir='/data/idr0035/sources/'
zarrdir="/data/idr0035/zarr/"
bf2raw_exec = "/opt/bioformats2raw/bioformats2raw-0.6.1/bin/bioformats2raw"
aws_exec = "~/venv/bin/aws"
with open('idr0035.HTD', 'r') as f:
htd_template = f.read()
plates = os.listdir(sourcedir)
commands = []
for plate in plates:
source_plate = os.path.join(sourcedir, plate)
target_plate = os.path.join(targetdir, plate)
os.makedirs(target_plate)
tiffs = os.listdir(source_plate)
for tiff in tiffs:
os.symlink(os.path.join(source_plate, tiff), os.path.join(target_plate, tiff))
htd_file = os.path.join(target_plate, "_".join(tiffs[0].split('_')[:2]) + ".HTD")
with open(htd_file,'w') as f:
f.write(htd_template.format(plate=plate))
zarr_plate = os.path.join(zarrdir, plate + ".zarr")
s3_zarr_plate = f"s3://idr0035/zarr/{plate}.zarr"
bf2raw_cmd = " ".join([bf2raw_exec, htd_file, zarr_plate, "-p"])
aws_cmd = " ".join(
[aws_exec, "--profile ebi", "--endpoint-url https://uk1s3.embassy.ebi.ac.uk" ,"s3 cp --recursive",
zarr_plate, s3_zarr_plate])
rm_cmd = " ".join(["rm", "-r", zarr_plate])
commands.append(" && ".join([bf2raw_cmd, aws_cmd, rm_cmd]))
with open("idr0035_commands", 'w') as f:
for command in commands:
f.write(command)
f.write('\n')
and the following MetaXpress file
"HTSInfoFile", Version 1.0
"Description", "BBBC021 {plate}"
"PlateType", 1
"TimePoints", 1
"ZSeries", FALSE
"ZSteps", 1
"ZProjection", FALSE
"XWells", 12
"YWells", 8
"WellsSelection1", FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
"WellsSelection2", FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE
"WellsSelection3", FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE
"WellsSelection4", FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE
"WellsSelection5", FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE
"WellsSelection6", FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE
"WellsSelection7", FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE
"WellsSelection8", FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
"Sites", TRUE
"XSites", 2
"YSites", 2
"SiteSelection1", TRUE, TRUE
"SiteSelection2", TRUE, TRUE
"Waves", TRUE
"NWavelengths", 3
"WaveName1", "DAPI"
"WaveName2", "Tubulin"
"WaveName3", "Actin"
"WaveCollect1", 1
"WaveCollect2", 1
"WaveCollect3", 1
"WaveCollect4", 1
"WaveCollect5", 1
"UniquePlateIdentifier", "abc123"
"EndFile"
A first plate has been uploaded to the idr0035
bucket and is ready for validation
(conversion) [sbesson@pilot-zarr2-dev idr0035]$ ~/venv/bin/aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk --profile ebi s3 ls s3://idr0035/zarr/
PRE Week1_22123.zarr/
Looks good in vizarr: https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0035/zarr/Week1_22123.zarr and in validator: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0035/zarr/Week1_22123.zarr
Import test plate on idr0125-pilot...
$ sudo mkdir /idr0035 && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr0035 /idr0035
$ ls /idr0035/zarr/
Week1_22123.zarr
# copy 1 metadata-only plate
$ mkdir Week1_22123.zarr && cd Week1_22123.zarr
$ aws s3 sync --no-sign-request --exclude '*' --include "*/.z*" --include "*.xml" --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3://idr0035/zarr/Week1_22123.zarr .
# oops - that missed a couple of files... copied from mounted bucket...
$ sudo cp /idr0035/zarr/Week1_22123.zarr/.zattrs ./
$ sudo cp /idr0035/zarr/Week1_22123.zarr/.zgroup ./
omero import --transfer=ln_s --depth=100 --name=Week1_22123.zarr --skip=all Week1_22123.zarr --file /tmp/idr0035_Week1_22123.zarr.log --errs /tmp/idr0035_Week1_22123.zarr.err
2023-06-16 12:09:45,326 825879 [l.Client-0] INFO ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /ngff/idr0035/Week1_22123.zarr/OME/METADATA.ome.xml
Other imported objects:
Fileset:5287124
==> Summary
1571 files uploaded, 1 fileset, 1 plate created, 240 images imported, 0 errors in 0:13:33.065
Update symlinks to s3 plate with chunks...
$ python idr-utils/scripts/managed_repo_symlinks.py Plate:10520 /idr0035/zarr/ --report
Fileset: 5287124 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-4/2023-06/16/11-56-13.628/
Render Image 14834501
fileset_dirs {}
fs_contents ['Week1_22123.zarr']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-4/2023-06/16/11-56-13.628/Week1_22123.zarr to /idr0035/zarr/Week1_22123.zarr
Looks good:
Any reason why this got moved to Upload to BioStudies
? As per https://github.com/IDR/idr-metadata/issues/639#issuecomment-1587110746, I only remember converting a single plate and I thought the next step was to convert the rest of the study?
Note the conversion is primarily waiting on the decision of how to zip the NGFF datasets (with or without the top-level directory) in preparation of upload to the BioImage Archive
@sbesson Sorry, yep - don't know why I did that. Does the conversion need to wait on zip decision? The zip step can happen later?
For the full study conversion, I used the following script to generate the symlinks, the HTD file and commands to executed
import os
sourcedir="/uod/idr/filesets/idr0035-caie-drugresponse/images"
targetdir='/data/idr0035/sources/'
zarrdir="/data/idr0035/zarr/"
bf2raw_exec = "/opt/bioformats2raw/bioformats2raw-0.6.1/bin/bioformats2raw"
with open('idr0035.HTD', 'r') as f:
htd_template = f.read()
plates = os.listdir(sourcedir)
plates.sort()
commands = []
for plate in plates:
source_plate = os.path.join(sourcedir, plate)
target_plate = os.path.join(targetdir, plate)
os.makedirs(target_plate)
tiffs = os.listdir(source_plate)
tiffs.sort()
firsttiff = tiffs[0]
if firsttiff.startswith("B02"):
platename = os.path.basename(source_plate)
prefix = platename + "_"
else:
platename = firsttiff[0:firsttiff.index("_B02")]
prefix = ""
for tiff in tiffs:
os.symlink(os.path.join(source_plate, tiff), os.path.join(target_plate, prefix + tiff))
htd_file = os.path.join(target_plate, platename + ".HTD")
with open(htd_file,'w') as f:
f.write(htd_template.format(plate=plate))
zarr_plate = os.path.join(zarrdir, plate + ".zarr")
s3_zarr_plate = f"s3://idr0035/zarr/{plate}.zarr"
bf2raw_cmd = " ".join([bf2raw_exec, htd_file, zarr_plate, "-p"])
commands.append(bf2raw_cmd)
with open("idr0035_commands", 'w') as f:
for command in commands:
f.write(command)
f.write('\n')
The prefix handling is required for the plates of Week4_
where it looks like something happened during the generation. The reader expected the MetaXpress TIFF files to be prefixed with <plate>_
to match <plate>.HTD
.
All 55 plates have been successfully converted into OME-NGFF (105G in total). @will-moore is the next step to zip these Zarr folders in preparation of the upload or do we want them uploaded as such to the ird0035
temporary bucket for testing?
Thanks @sbesson - lets upload 1 or 2 representative plates to idr0035 temporary bucket for validation, import etc.
But can also start zipping them all, ready for BioStudies upload. Zipping the outer ome.zarr
dir for each fileset.
(base) [sbesson@pilot-zarr1-dev zarr]$ ls
Week10_40111.zarr Week1_22161.zarr Week2_24141.zarr Week3_25421.zarr Week3_25721.zarr Week4_27821.zarr Week5_29301.zarr Week6_31681.zarr Week7_34381.zarr Week8_38221.zarr Week9_39221.zarr
Week10_40115.zarr Week1_22361.zarr Week2_24161.zarr Week3_25441.zarr Week4_27481.zarr Week4_27861.zarr Week5_29321.zarr Week6_32061.zarr Week7_34641.zarr Week8_38241.zarr Week9_39222.zarr
Week10_40119.zarr Week1_22381.zarr Week2_24361.zarr Week3_25461.zarr Week4_27521.zarr Week5_28901.zarr Week5_29341.zarr Week6_32121.zarr Week7_34661.zarr Week8_38341.zarr Week9_39282.zarr
Week1_22123.zarr Week1_22401.zarr Week2_24381.zarr Week3_25681.zarr Week4_27542.zarr Week5_28921.zarr Week6_31641.zarr Week6_32161.zarr Week7_34681.zarr Week8_38342.zarr Week9_39283.zarr
Week1_22141.zarr Week2_24121.zarr Week2_24401.zarr Week3_25701.zarr Week4_27801.zarr Week5_28961.zarr Week6_31661.zarr Week7_34341.zarr Week8_38203.zarr Week9_39206.zarr Week9_39301.zarr
Note the generated datasets only have .zarr
as the extension. Is this a concern?
It's possible that some steps of the workflow might need tweaking to accommodate that naming but I don't expect that to be a problem.
Since it was only 100G in total, all 55 plates have been uploaded to the S3 bucket for validation. Also zipped them in-place using
$ for i in $(ls .); do zip -rm $i.zip $i; done
Ready for the next phase
Data look good in vizarr and validator:
$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0035/idr0035 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/136e8d-xxxxxxxx
...
Week9_39301.zarr.zip 100% 1166MB 188Mb/s 36:18
Completed: 72295277K bytes transferred in 2179 seconds
(271770K bits/sec), in 55 files, 1 directory.
Delete zips dir...
$ sudo rm -rf idr0035
We currently have 46 out of 55 images viewable at https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD847.html
But going to try and replace filesets for ALL, as described https://github.com/joshmoore/omero-mkngff/issues/2 (without chunks - testing https://github.com/joshmoore/omero-mkngff/pull/5).
NB: needed to edit .zarr -> .ome.zarr
for all filenames from page above when creating csv...
With Fileset IDs added...
idr0035/Week9_39282.ome.zarr,S-BIAD847/072015f4-a5f8-4c13-be31-e31c47a868e9,23737
idr0035/Week9_39221.ome.zarr,S-BIAD847/0762bf96-4f01-454d-9b13-5c8438ea384f,23721
idr0035/Week1_22361.ome.zarr,S-BIAD847/096c540e-b457-41a3-af71-115de4ae1d81,23723
idr0035/Week7_34681.ome.zarr,S-BIAD847/0cc0807d-fb05-47bd-8b20-9fbcfa1b3f10,23745
idr0035/Week10_40115.ome.zarr,S-BIAD847/0efb3051-1297-4a6d-8934-69b739e9509a,23702
idr0035/Week4_27481.ome.zarr,S-BIAD847/12e8a772-7741-4702-bedd-14147c63aa27,23740
idr0035/Week3_25701.ome.zarr,S-BIAD847/23c4b6ba-2b9e-4c25-acb5-6436876d7232,23706
idr0035/Week7_34641.ome.zarr,S-BIAD847/24bffbcd-7127-4275-85a8-d66952df61bd,23750
idr0035/Week3_25461.ome.zarr,S-BIAD847/29447991-c26b-4375-81e1-d5ef44e10c9d,23711
idr0035/Week8_38341.ome.zarr,S-BIAD847/2b30d331-d4a7-488f-9956-8f7959568270,23707
idr0035/Week1_22161.ome.zarr,S-BIAD847/39e23438-5b51-4c6b-9a73-2dc29d92b7b8,23715
idr0035/Week6_31641.ome.zarr,S-BIAD847/3f476d50-34b8-4b61-929e-65a46dd12745,23710
idr0035/Week6_32161.ome.zarr,S-BIAD847/40dc14ac-bef0-4f42-95ef-70e62bf73a8d,23730
idr0035/Week8_38241.ome.zarr,S-BIAD847/46764312-5903-4c55-bebb-d819c440c5c1,23736
idr0035/Week4_27542.ome.zarr,S-BIAD847/4a8e2de4-3a89-4b16-9daa-02ed15c19438,23724
idr0035/Week2_24401.ome.zarr,S-BIAD847/526722a3-0526-40b3-bf8b-bbb3640453f6,23719
idr0035/Week9_39222.ome.zarr,S-BIAD847/5534853c-6e05-499d-8fdb-4495dbcca5a2,23741
idr0035/Week1_22381.ome.zarr,S-BIAD847/58b77c5a-b00b-44d2-9ce9-abc9c57577f0,23739
idr0035/Week6_32061.ome.zarr,S-BIAD847/6495b09c-611b-4a59-9811-0767eb18787d,23722
idr0035/Week4_27821.ome.zarr,S-BIAD847/7246d9aa-e7d5-4986-9916-6fc4998849df,23733
idr0035/Week10_40119.ome.zarr,S-BIAD847/7bd8dac4-2a28-40fd-818a-b6feb7b4aa5b,23703
idr0035/Week5_29301.ome.zarr,S-BIAD847/7dedf881-eb4a-4420-9006-533d285980cb,23746
idr0035/Week3_25441.ome.zarr,S-BIAD847/83a0eee7-b85a-4ef4-a5e5-4745010edec0,23708
idr0035/Week7_34381.ome.zarr,S-BIAD847/84bf3c0b-e013-4e47-a64e-54d6032a7978,23742
idr0035/Week9_39206.ome.zarr,S-BIAD847/85795edb-f1cd-46c7-873c-33dbe9fa86a0,23732
idr0035/Week4_27521.ome.zarr,S-BIAD847/8859e4f3-46be-4b54-9b0f-1cee14ad3db4,23731
idr0035/Week10_40111.ome.zarr,S-BIAD847/8c168760-7bc0-4692-90b1-c774711e7dd8,23751
idr0035/Week8_38221.ome.zarr,S-BIAD847/8cf76a11-6962-4977-9a37-312535353efd,23701
idr0035/Week3_25681.ome.zarr,S-BIAD847/8ee3053e-4eed-4e2a-8000-c45f677d9bb8,23718
idr0035/Week2_24161.ome.zarr,S-BIAD847/9849e317-4d15-486d-91c0-52c0df4a8946,23744
idr0035/Week6_32121.ome.zarr,S-BIAD847/a3f550b2-bc8e-4ab5-a00d-b8f5ba0b0ee2,23717
idr0035/Week9_39283.ome.zarr,S-BIAD847/aad0350b-35aa-4eac-987b-f024dd7e3c23,23754
idr0035/Week4_27801.ome.zarr,S-BIAD847/acb05e43-1461-47ba-b407-e6694f74dd40,23749
idr0035/Week2_24381.ome.zarr,S-BIAD847/ae7492e4-45d1-43ad-b162-f4118ba2c301,23714
idr0035/Week4_27861.ome.zarr,S-BIAD847/c2e106b1-91d6-451e-8c1e-17b2ef63f987,23709
idr0035/Week2_24141.ome.zarr,S-BIAD847/c333fb92-764b-4189-84f3-0001cd7795d7,23729
idr0035/Week2_24121.ome.zarr,S-BIAD847/cce8530a-f386-4173-9ed1-0a9dbf63da1a,23712
idr0035/Week3_25721.ome.zarr,S-BIAD847/ceca342a-f340-41c1-bc56-7e745d116712,23728
idr0035/Week1_22123.ome.zarr,S-BIAD847/d21e3a6e-e82d-4d39-8a60-1a19e0049230,23704
idr0035/Week8_38342.ome.zarr,S-BIAD847/d885dc91-3820-43bd-a1f5-5d4644e8dba3,23747
idr0035/Week6_31661.ome.zarr,S-BIAD847/dd8b022a-d964-420b-a22b-e58a99c52e09,23720
idr0035/Week1_22141.ome.zarr,S-BIAD847/df9f44f7-5bc6-4c6e-977c-c1a8e57e9428,23752
idr0035/Week2_24361.ome.zarr,S-BIAD847/e7fa17f6-326d-4810-9149-aed788e74345,23743
idr0035/Week5_29341.ome.zarr,S-BIAD847/efaee486-12ad-4160-8405-2edc0478f364,23727
idr0035/Week5_28961.ome.zarr,S-BIAD847/fbbc73e3-1723-47c0-8a18-0ee6d3ad4b52,23753
idr0035/Week1_22401.ome.zarr,S-BIAD847/fe85ca73-e390-4fa8-9f6d-5c21d91c2bc1,23725
for r in $(cat idr0035.csv); do
biapath=$(echo $r | cut -d',' -f2)
uuid=$(echo $biapath | cut -d'/' -f2)
fsid=$(echo $r | cut -d',' -f3)
omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$fsid.sql"
done
This took about 1hr 15 mins
.
$ for r in $(cat idr0035.csv); do
fsid=$(echo $r | cut -d',' -f3)
psql -U omero -d idr -h 192.168.10.102 -f "$fsid.sql"
done
Takes 1 or 2 secs for each sql.
Success!! After saving new rendering settings to regenerate thumbnails...
Test on idr-testing:omeroreadwrite
, with idr0035.csv from https://github.com/IDR/idr-utils/pull/56/commits/7ce43a3976edd7840987f3c0b625b0839d0a8794
Started mkngff 16:32...
...completed 22:56 (6.5 hours).
Load image from first Plate: http://localhost:1080/webclient/?show=image-3426101 memo file in progress....
Memo file generation appears twice in logs for that fileset...
[wmoore@test120-omeroreadwrite ~]$ grep -A 2 "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log | grep -A 2 "14-20-58.331_mkngff"
2023-09-13 23:00:16,310 DEBUG [ loci.formats.Memoizer] (.Server-13) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-01/24/14-20-58.331_mkngff/8c168760-7bc0-4692-90b1-c774711e7dd8.zarr/OME/.METADATA.ome.xml.bfmemo (666953 bytes)
2023-09-13 23:00:16,310 DEBUG [ loci.formats.Memoizer] (.Server-13) start[1694642380659] time[3635650] tag[loci.formats.Memoizer.setId]
2023-09-13 23:00:16,310 INFO [ ome.io.nio.PixelsService] (.Server-13) Creating BfPixelBuffer: /data/OMERO/ManagedRepository/demo_2/2018-01/2414-20-58.331_mkngff/8c168760-7bc0-4692-90b1-c774711e7dd8.zarr/OME/METADATA.ome.xml Series: 0
--
2023-09-13 23:02:50,379 DEBUG [ loci.formats.Memoizer] (.Server-19) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2018-01/24/14-20-58.331_mkngff/8c168760-7bc0-4692-90b1-c774711e7dd8.zarr/OME/.METADATA.ome.xml.bfmemo (667010 bytes)
2023-09-13 23:02:50,379 DEBUG [ loci.formats.Memoizer] (.Server-19) start[1694642459802] time[3710577] tag[loci.formats.Memoizer.setId]
2023-09-13 23:02:50,379 INFO [ ome.io.nio.PixelsService] (.Server-19) Creating BfPixelBuffer: /data/OMERO/ManagedRepository/demo_2/2018-01/2414-20-58.331_mkngff/8c168760-7bc0-4692-90b1-c774711e7dd8.zarr/OME/METADATA.ome.xml Series: 0
3710577 ms is 1 hour
idr0035-caie-drugresponse
Sample plate conversion failed with:
That looks unrelated to https://github.com/IDR/bioformats/issues/29 , doesn't it @sbesson ?