IDR / idr-metadata

Curated metadata for all studies published in the Image Data Resource
https://idr.openmicroscopy.org
14 stars 24 forks source link

idr0013-neumann-mitocheck S-BIAD865 #644

Open will-moore opened 1 year ago

will-moore commented 1 year ago

idr0013-neumann-mitocheck

pwalczysko commented 1 year ago

Reimport still in progress - cancelled once because of long wait on FILESET_UPLOAD_PREP. The new import in progress since 8 March, also FILESET_UPLOAD_PREP (with parallel-upload=10)

will-moore commented 1 year ago

As discussed today, it is probably worth to try and import without chunks, then to add the chunks back by sym-linking to the full plate from the ManagedRepository.

This workflow has allowed me to import big plates from idr0125. In that case, I created a "metadata only" plate (no chunks) by downloading from s3 using a sync command that ignored chunks.

In a single-image case, I recently achieved the same thing by making a copy of the NGFF Image, then deleting chunks by deleing files by name, E.g. all files named "0": https://github.com/IDR/idr-metadata/issues/652#issuecomment-1491814772 If you only have files named "0" or "1" or "2" you will have to delete each in turn, although there is probably a way to do it in 1 command?

Then, import the metadata only Plate. E.g. for idr0125 - 384-well plate, 9 fields per Well - took ~2 hours.

Then, try to view images in the Plate - they should appear as black.

Then you can delete the metadata-only plate in Managed Repo and replace it with symlink to the full plate. In the case of idr0125 I was able to do this running https://github.com/IDR/idr0125-way-cellpainting/blob/main/scripts/symlinks.bash as the omero-server user

sudo -u omero-server -s
symlinks.bash

But on pilot-idrtesting I needed to use a different user to do the delete and symlinking: https://github.com/IDR/idr-metadata/issues/652#issuecomment-1491814772

pwalczysko commented 1 year ago

Thanks @will-moore

ls -lah LT0008_31.ome.zarr/
total 36K
drwxrwxr-x. 19 dlindner dlindner 191 Feb 16 12:26 .
drwxrwxr-x.  3 dlindner dlindner  94 Feb 16 12:02 ..
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 A
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 B
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 C
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 D
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 E
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 F
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 G
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 H
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 I
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 J
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 K
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 L
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 M
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 N
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 O
drwxrwxr-x.  2 dlindner dlindner  60 Feb 16 12:02 OME
drwxrwxr-x. 26 dlindner dlindner 252 Feb 16 12:26 P
-rw-rw-r--.  1 dlindner dlindner 31K Feb 16 12:26 .zattrs
-rw-rw-r--.  1 dlindner dlindner  23 Feb 16 12:02 .zgroup

So would you recommend to delete all the A-P files ?

will-moore commented 1 year ago

No, those A-P are directories that contain important files etc. You only want to delete the chunks, which are files named 0, 1 etc.

You can list them with e.g.

find -type f -name '0'

count them:

find -type f -name '0' | wc
will-moore commented 1 year ago

And only delete the chunks from a copy of the Plate - Don't delete the originals.

Delete chunks with e.g:

sudo find -type f -name '0' -delete
pwalczysko commented 1 year ago

After having done the workflow suggested by @will-moore I have no imports found response. I have deleted the

sudo find -type f -name '0' -delete
sudo find -type f -name '1' -delete

Then tried

  1. to point the importer onto the .../OME/METADATA... file omero import --parallel-upload=10 --transfer=ln_s --skip=all --depth 10 --name "idr0013-nochunks" /data/ngff/idr0013/LT0008_31.ome.zarr-copy/OME/METADATA.ome.xml --file /tmp/idr0013-nochun.log --errs /tmp/idr0013-nochun.err
  2. to point the importer to the whole copied and trimmed folder (omero import --parallel-upload=10 --transfer=ln_s --skip=all --depth 10 --name "idr0013-nochunks" /data/ngff/idr0013/LT0008_31.ome.zarr-copy)

Both attempts above end in no imports found

will-moore commented 1 year ago

@pwalczysko it might be that the plate name has to end with .zarr extension? Also, I presume that --depth 10 and --depth=10 are the same?

pwalczysko commented 1 year ago

it might be that the plate name has to end with .zarr extension?

Indeed, thank you @will-moore , this did the trick. The data are now imported as http://localhost:1080/webclient/?show=plate-253 (idr0013-nochunks). Also, I have replaced the file in the ManagedRepo as instructed with the symlink to the original chunks and the images in the plate http://localhost:1080/webclient/?show=plate-253 are displaying correctly in iviewer, the timelapse is playing okay too.

will-moore commented 1 year ago

Looks great! I adjusted rendering settings and "Saved to all" so the thumbnails are clearer - they all regenerated fine 👍

will-moore commented 1 year ago

Try to guess how much space is needed for conversion. Raw data is 8bit (1 byte per pixel), single Z & C timelapse

ScreenA 1344 x 1024 x 93 x 384 x 510 plates = 25TB ScreenB 25 plates (slightly sparse) ~ 1.2 TB

dominikl commented 1 year ago

On pilot-zarr2-dev:

Converting one plate takes ~30min, zipping ~50min (without compression 7min!). Converted plate size 36Gb, zipped 28Gb.

7zip (p7zip): 5min (also 28Gb), (without compression 4min)

There are 538 plates in total.

dominikl commented 1 year ago

Created batch directories for each 10 plates under /data/ngff/idr0013 . Trying to do 10 conversions and 10 zip/uploads/delete a time, due to the disk space limitation.

dominikl commented 1 year ago

For conversion:

cd /data/ngff/idr0013/batch_XX
for i in `cat ../batch_XX.txt`; do ~/bioformats2raw/bin/bioformats2raw --memo-directory ../../memo /uod/idr/metadata/idr0013-neumann-mitocheck/screens/$i ${i%.*}.ome.zarr; done
# Note: The input file batch_XX.txt is one directory up in /data/ngff/idr0013 !

For zipping: Each batch directory contains a zip.sh which zips and deletes the original if successful

cd /data/ngff/idr0013/batch_XX
for i in `ls | grep zarr`; do ./zip.sh $i; done

For upload:

mv *.zip idr0013.   # each batch dir already has an empty idr0013 subdir
ascp -P33001 -i ~/.aspera/cli/etc/asperaweb_id_dsa.openssh -d idr0013 bsaspera_w@hx-fasp-1.ebi.ac.uk:<SECRET_DIDR>

Then add zu files.tsv and delete:

ls idr0013 >> ../idr0013_files.tsv
rm idr0013/*.zip
dominikl commented 1 year ago

Failing plate:

(base) [dlindner@pilot-zarr2-dev batch_3]$ ~/bioformats2raw/bin/bioformats2raw --memo-directory ../../memo  /uod/idr/metadata/idr0013-neumann-mitocheck/screens/LT0012_29--ex2005_06_10--sp2005_04_08--tt16--c3.screen LT0012_29--ex2005_06_10--sp2005_04_08--tt16--c3.ome.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp3633973597553018286/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@63a65a25): java.lang.NullPointerException
        at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
        at picocli.CommandLine.call(CommandLine.java:2761)
        at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2192)
Caused by: java.lang.NullPointerException
        at ome.xml.meta.OMEXMLMetadataImpl.getWellSampleImageRef(OMEXMLMetadataImpl.java:5205)
        at com.glencoesoftware.bioformats2raw.Converter.hasValidPlate(Converter.java:2055)
        at com.glencoesoftware.bioformats2raw.Converter.convert(Converter.java:604)
        at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:516)
        at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:107)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        ... 9 more

I guess there will be more. I'll start and append to this list here to keep track of them:

dominikl commented 1 year ago

Wrapped it all into one script:

#!/bin/bash

# Usage: ./run.sh screens.txt log.txt

# Disable all output
exec 2>&1 1>/dev/null

for i in `cat $1`;
do
    date >> $2
    echo "Converting $i" >> $2
    zarr_file=${i%.*}.ome.zarr
    ~/bioformats2raw/bin/bioformats2raw --memo-directory /data/ngff/memo /uod/idr/metadata/idr0013-neumann-mitocheck/screens/$i $zarr_file
    if [ $? -eq 0 ]
    then
        echo "Zipping ${zarr_file}" >> $2
        7za -mmt8 a ${zarr_file}.zip ${zarr_file}
        if [ $? -eq 0 ]
        then
            rm -rf ${zarr_file}
            mv ${zarr_file}.zip idr0013/
            echo "Uploading ${zarr_file}.zip" >> $2
            ascp -P33001 -i ~/.aspera/cli/etc/asperaweb_id_dsa.openssh -d idr0013 bsaspera_w@hx-fasp-1.ebi.ac.uk:/<SECRET_DIR>
            if [ $? -eq 0 ]
            then
                echo ${zarr_file}.zip >> files.tsv
                rm idr0013/${zarr_file}.zip
            else
                echo "ERR Upload failed." >> $2
            fi
        else
            echo "ERR Zipping failed." >> $2
        fi
    else
        echo "ERR Converting failed." >> $2
    fi
done

It's running now in three sessions (screens) in /data/ngff/idr0013_new/run_1 / 2 /3 (there is a run_4 as well, but that might be a bit too much).

dominikl commented 1 year ago

This is currently doing 3 conversions in a bit more than an hour. So should all be done in ~8 days.

dominikl commented 1 year ago

Finished. Only LT0012_29--ex2005_06_10--sp2005_04_08--tt16--c3.screen failed conversion (see above).

dominikl commented 1 year ago

Really finished now, exported the LT0012_29 plate with omero cli zarr. (LT0012_29.ome.zarr.zip)

will-moore commented 1 year ago

Looking into submission error with file names in idr0013_files.tsv.

Looks like problem is that each row doesn't include the directory with idr0013/...

But I also noticed a zip called LT0012_29.ome.zarr.zip which looks wrong (different from the others). Now I see above that this was generated via omero-cli-zarr so that it matches the Plate name in IDR, whereas all the others have much longer names.

To try and make this consistent with the others, I downloaded it (via web page), renamed it and uploaded via Aspera...

$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d ~/Downloads/LT0012_29--ex2005_06_10--sp2005_04_08--tt16--c3.ome.zarr.zip bsaspera_w@hx-fasp-1.ebi.ac.uk:/5f/136e8d-e575-4755-9ac2-aa7fc10cae67-a26596/idr0013/

Checked on https://www.ebi.ac.uk/biostudies/submissions/files?path=%2Fuser%2Fidr0013 that the file sizes of renamed file matched the old file, then deleted LT0012_29.ome.zarr.zip.

Upload new idr0013_files.tsv

will-moore commented 1 year ago

All 538 Plates now available at https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD865.html

idr0013.csv at https://github.com/IDR/idr-utils/pull/56/commits/cac35aa0d1731afb5db0ab6b60e10bdf03c591fd

$ for r in $(cat $IDRID.csv); do
>   biapath=$(echo $r | cut -d',' -f2)
>   uuid=$(echo $biapath | cut -d'/' -f2)
>   fsid=$(echo $r | cut -d',' -f3)
>   omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
> done
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/2016-05/09 // 05-00-41.632 for fileset 18761
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/09/05-00-41.632
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-05/09/05-00-41.632_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-05/09/05-00-41.632_mkngff/011c38fb-c3d0-4d1d-82d8-9147a5060d88.zarr -> /bia-integrator-data/S-BIAD865/011c38fb-c3d0-4d1d-82d8-9147a5060d88/011c38fb-c3d0-4d1d-82d8-9147a5060d88.zarr
...
will-moore commented 1 year ago

Even a day later, the first sql fileset hadn't completed!

After installing a potential fix https://github.com/IDR/omero-mkngff/pull/11#issuecomment-1727189948 re-ran again...

After 50 minutes, we have done 17 filesets - 3 minutes per Fileset!

will-moore commented 1 year ago

mkngff loop failed with goofys mount: see https://github.com/IDR/idr-metadata/issues/671#issuecomment-1727328137 Needed server restart (so existing sql files are invalid). Deleted them and restarted, using mkngff latest commit which ignores existing symlinks: https://github.com/IDR/omero-mkngff/pull/11/commits/0e4dca393d6821c1b78d4fd0bac35e7d99abe078

will-moore commented 1 year ago

Moved the first 3 .sql files to test dir to run sql while mkngff is still running...

cd idr0013_test
for r in $(ls ./); do
  psql -U omero -d idr -h $DBHOST -f "$r"
done

UPDATE 380
BEGIN
 mkngff_fileset 
----------------
        6311999
(1 row)
COMMIT
UPDATE 380
BEGIN
 mkngff_fileset 
----------------
        6312000
(1 row)
COMMIT
UPDATE 380
BEGIN
 mkngff_fileset 
----------------
        6312001
(1 row)
COMMIT

Viewing image from first plate: http://localhost:1080/webclient/?show=image-1613300

will-moore commented 1 year ago

Failed with ResourceError. Checked Blitz logs..

2023-09-20 11:23:14,170 DEBUG [                   loci.formats.Memoizer] (l.Server-6) start[1695208695185] time[298985] tag[loci.formats.Memoizer.setId]
2023-09-20 11:23:14,171 ERROR [         ome.io.bioformats.BfPixelBuffer] (l.Server-6) Failed to instantiate BfPixelsWrapper with /data/OMERO/ManagedRepository/demo_2/2016-05/09/05-00-41.632_mkngff/011c38fb-c3d0-4d1d-82d8-9147a5060d88.zarr/OME/METADATA.ome.xml
2023-09-20 11:23:14,172 ERROR [                ome.io.nio.PixelsService] (l.Server-6) Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-05/09/05-00-41.632_mkngff/011c38fb-c3d0-4d1d-82d8-9147a5060d88.zarr/OME/METADATA.ome.xml
java.lang.RuntimeException: java.io.IOException: Path '/bia-integrator-data/S-BIAD865/011c38fb-c3d0-4d1d-82d8-9147a5060d88/011c38fb-c3d0-4d1d-82d8-9147a5060d88.zarr/M/23' is not a valid path or not a directory.
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
        at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
        at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898)

/M/23 is a missing Well for this plate, so we shouldn't be trying to read from that dir.

will-moore commented 1 year ago

Viewing a different Plate from idr0004 with missing Wells gives same error:

        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Path '/bia-integrator-data/S-BIAD867/103d9428-b86b-4f4e-84d8-966b5d89aae1/103d9428-b86b-4f4e-84d8-966b5d89aae1.zarr/A/1' is not a valid path or not a directory.
        at com.bc.zarr.ZarrUtils.ensureDirectory(ZarrUtils.java:158)
        at com.bc.zarr.ZarrGroup.open(ZarrGroup.java:95)
        at com.bc.zarr.ZarrGroup.open(ZarrGroup.java:88)
will-moore commented 1 year ago

To see if a non-sparse Plate would work, updated

$ psql -U omero -d idr -h $DBHOST -f 18460.sql 
UPDATE 384
BEGIN
 mkngff_fileset 
----------------
        6312002
(1 row)
COMMIT

http://localhost:1080/webclient/?show=well-802140

... but this failed due to goofys: https://github.com/IDR/idr-metadata/issues/671#issuecomment-1727715356

will-moore commented 1 year ago

Goofys failed again, when re-running mkngff sql...

  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_mkngff/__init__.py", line 185, in sql
    if not symlink_path.exists():
  File "/usr/lib64/python3.6/pathlib.py", line 1336, in exists
    self.stat()
  File "/usr/lib64/python3.6/pathlib.py", line 1158, in stat
    return self._accessor.stat(self)
  File "/usr/lib64/python3.6/pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)
OSError: [Errno 107] Transport endpoint is not connected: '/bia-integrator-data/S-BIAD865/ffe4bcd6-a5dd-4c7f-ace2-751f67921207/ffe4bcd6-a5dd-4c7f-ace2-751f67921207.zarr'
will-moore commented 1 year ago

A big problem with goofys failing (twice above) is that we need to restart the server to re-mount and this means that previously generated sql become invalid due to a different $SECRET being generated.

Need to move to a workflow of creating and executing the sql immediately...

for row in csv:
  omero mkngff sql > fileset.sql
  psql -f fileset.sql
will-moore commented 1 year ago
for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3)
  omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
done
will-moore commented 1 year ago

http://localhost:1080/webclient/?show=well-802140 eventually viewable...

$ grep -A 2 "22.251_mkngff/04c70c80" /opt/omero/server/OMERO.server/var/log/Blitz-0.log | grep -A 2 "saved memo"
2023-09-20 15:27:16,224 DEBUG [                   loci.formats.Memoizer] (l.Server-9) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2016-04/30/15-54-22.251_mkngff/04c70c80-bc2e-4210-a21f-d2f02108b829.zarr/OME/.METADATA.ome.xml.bfmemo (529578 bytes)
2023-09-20 15:27:16,224 DEBUG [                   loci.formats.Memoizer] (l.Server-9) start[1695222972274] time[663949] tag[loci.formats.Memoizer.setId]
2023-09-20 15:27:16,224 INFO  [                ome.io.nio.PixelsService] (l.Server-9) Creating BfPixelBuffer: /data/OMERO/ManagedRepository/demo_2/2016-04/30/15-54-22.251_mkngff/04c70c80-bc2e-4210-a21f-d2f02108b829.zarr/OME/METADATA.ome.xml Series: 0

663949 ms is 11 minutes

will-moore commented 1 year ago

mkgff sql failed again with goofys mount

Got about 40 complete - most others are 0 bytes.

$ ls -alh idr0013 | grep "r 4" 
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 14:41 18376
.sqlr--r--.  1 omero-server omero-server 484K Sep 20 14:29 18379
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 14:19 18392
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 14:38 18421
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 14:51 18456
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 14:16 18460
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 15:34 18476
.sqlr--r--.  1 omero-server omero-server 463K Sep 20 15:37 18478
.sqlr--r--.  1 omero-server omero-server 482K Sep 20 14:25 18532
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 15:59 18533
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 15:49 18538
.sqlr--r--.  1 omero-server omero-server 484K Sep 20 15:11 18543
.sqlr--r--.  1 omero-server omero-server 484K Sep 20 15:21 18545
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:32 18561
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:07 18562
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:35 18567
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:31 18598
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:13 18654
.sqlr--r--.  1 omero-server omero-server 478K Sep 20 15:46 18660
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:43 18667
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:22 18704
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:54 18705
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:25 18717
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 13:58 18727
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:44 18729
.sqlr--r--.  1 omero-server omero-server 486K Sep 20 15:18 18735
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:06 18741
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:57 18749
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 13:55 18761
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:53 18813
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:56 18822
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:10 18838
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:28 18840
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:47 18841
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:00 18852
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:15 18911
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:04 18914
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 14:01 18933
.sqlr--r--.  1 omero-server omero-server 481K Sep 20 15:40 18935
.sqlr--r--.  1 omero-server omero-server 433K Sep 20 15:03 22203

Kinda painful to pick up where we left off with mkngff sql, since we don't have a good way to skip all the filesets that have been successfully processed.

Updated omero-mkngff with https://github.com/IDR/omero-mkngff/pull/11/commits/a2d0aeeb5195e7374c7cb48e5d989d813a05f982 So now we output nothing if we have previously successfully generated sql output (as known by the existence of the symlink_dir in managed repo, which is now created after sql output).

Now we just need to update the command to append to the sql file instead of writing to it, to avoid overwriting the existing files.

We also want to use the old SECRET from those existing sql files, so that the new ones are the same and we can do a global replace when needed.

export SECRET=b76bb9c5-92b7-42c7-809e-97c808b4598a
for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3)
  omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" >> "$IDRID/$fsid.sql"
done

Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/2016-05/09/05-00-41.632 for fileset 18761
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/09/05-00-41.632
Symlink dir exists at /data/OMERO/ManagedRepository/demo_2/2016-05/09/05-00-41.632_mkngff - skipping sql output
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/2016-05/08/16-44-06.910 for fileset 18727
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/08/16-44-06.910
Symlink dir exists at /data/OMERO/ManagedRepository/demo_2/2016-05/08/16-44-06.910_mkngff - skipping sql output
...
# last fileset where symlink found - NB: this probably didn't output sql before!
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/2016-04/30/22-03-36.052 for fileset 18469
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-04/30/22-03-36.052
Symlink dir exists at /data/OMERO/ManagedRepository/demo_2/2016-04/30/22-03-36.052_mkngff - skipping sql output

# first fileset to generate sql in this round...
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/2016-05/01/05-35-47.122 for fileset 18479
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/01/05-35-47.122
will-moore commented 1 year ago

Needed another server restart to re-mount goofys... Re-ran again as above... First fileset of this round 18800...

will-moore commented 1 year ago

Needed another server restart to re-mount goofys... Re-ran again as above... First fileset of this round 18386...

will-moore commented 1 year ago

Since running the mkngff for this and idr0016 at the same time on idr-testing is causing goofys issues, going to pause on this one now until idr0016 is done....

will-moore commented 1 year ago

Picking up where we left off... Work out where to start....

for r in $(cat $IDRID.csv); do
  fsid=$(echo $r | cut -d',' -f3)
  ls -alh "$IDRID/$fsid.sql"
done

Kept these 4 rows (no sql exported) deleted the other completed rows from idr0013.csv on idr-testing.. 18761?.sql 18727?.sql 18933?.sql 18469?.sql 18458?.sql

for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3)
  omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
done
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-05/09/05-00-41.632 for fileset: 18761
will-moore commented 1 year ago

Repeated several times, each time processing 20 - 40 Filesets...

will-moore commented 1 year ago

Restarted again... seems to be 39 or 40 each time.

(venv3) bash-4.2$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3);   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-09/20/20-36-59.899 for fileset: 22207
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-05/11/09-33-21.804 for fileset: 18867
...
will-moore commented 1 year ago

Restarted again after another 39...

(venv3) bash-4.2$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3);   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-05/07/19-34-24.204 for fileset: 18687
...
will-moore commented 1 year ago

Need to fix naming of sql. Using fsid=$(echo $r | cut -d',' -f3) this includes a line-break character if the csv has been downloaded with wget https://raw.githubusercontent.com/IDR/idr-utils/cac35aa0d1731afb5db0ab6b60e10bdf03c591fd/scripts/ngff_filesets/idr0013.csv We can use | tr -d '[:space:]' to strip this off.

for r in $(cat $IDRID.csv); do
  fsid=$(echo $r | cut -d',' -f3)
  newid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  mv "$IDRID/$fsid.sql" "$IDRID/$newid.sql"
done
mv: cannot stat ‘idr0013/18351\r.sql’: No such file or directory
mv: cannot stat ‘idr0013/18353.sql’: No such file or directory
will-moore commented 1 year ago

Check for .zarray files...

for r in $(cat $IDRID.csv); do
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  echo "$IDRID/$fsid.sql $(grep -c 'zarray' $IDRID/$fsid.sql)"
done
idr0013/18761.sql 1520
idr0013/18727.sql 1520
idr0013/18933.sql 1520
idr0013/18914.sql 0
idr0013/18562.sql 0
idr0013/18838.sql 0
idr0013/18654.sql 0
idr0013/18460.sql 0
idr0013/18392.sql 0
idr0013/18704.sql 0
idr0013/18532.sql 0
idr0013/18379.sql 0
idr0013/18561.sql 0
idr0013/18567.sql 0
idr0013/18421.sql 0
idr0013/18376.sql 0
idr0013/18729.sql 0
idr0013/18841.sql 0
idr0013/18456.sql 0
idr0013/18705.sql 0
idr0013/18749.sql 0
idr0013/18852.sql 0
idr0013/22203.sql 0
idr0013/18741.sql 0
idr0013/18823.sql 0
idr0013/18543.sql 0
idr0013/18911.sql 0
idr0013/18735.sql 0
idr0013/18545.sql 0
idr0013/18717.sql 0
idr0013/18840.sql 0
idr0013/18598.sql 0
idr0013/18476.sql 0
idr0013/18478.sql 0
idr0013/18935.sql 0
idr0013/18667.sql 0
idr0013/18660.sql 0
idr0013/18538.sql 0
idr0013/18813.sql 0
idr0013/18822.sql 0
idr0013/18533.sql 0
idr0013/18469.sql 1536
idr0013/18479.sql 0
idr0013/22216.sql 0
idr0013/18906.sql 0
idr0013/22223.sql 0
idr0013/18797.sql 0
idr0013/18352.sql 0
idr0013/18355.sql 0
idr0013/22206.sql 0
idr0013/18578.sql 0
idr0013/18707.sql 0
idr0013/18766.sql 0
idr0013/18855.sql 0
idr0013/18802.sql 0
idr0013/18462.sql 0
idr0013/18601.sql 0
idr0013/18775.sql 0
idr0013/18381.sql 0
idr0013/18800.sql 0
idr0013/18763.sql 0
idr0013/18767.sql 0
idr0013/18915.sql 0
idr0013/18520.sql 0
idr0013/18725.sql 0
idr0013/18777.sql 0
idr0013/18869.sql 0
idr0013/18411.sql 0
idr0013/18512.sql 0
idr0013/18383.sql 0
idr0013/18737.sql 0
idr0013/18839.sql 0
idr0013/18701.sql 0
idr0013/18662.sql 0
idr0013/18833.sql 0
idr0013/18836.sql 0
idr0013/18784.sql 0
idr0013/18472.sql 0
idr0013/18923.sql 0
idr0013/18594.sql 0
idr0013/18529.sql 0
idr0013/18361.sql 0
idr0013/18528.sql 0
idr0013/18747.sql 0
idr0013/18464.sql 0
idr0013/18848.sql 0
idr0013/18765.sql 0
idr0013/18826.sql 0
idr0013/18799.sql 0
idr0013/18661.sql 0
idr0013/18470.sql 0
idr0013/18948.sql 0
idr0013/18864.sql 0
idr0013/18732.sql 0
idr0013/18790.sql 0
idr0013/18953.sql 0
idr0013/18386.sql 0
idr0013/18716.sql 0
idr0013/18787.sql 0
idr0013/18461.sql 0
idr0013/18384.sql 0
idr0013/22227.sql 0
idr0013/18947.sql 0
idr0013/18566.sql 0
idr0013/22222.sql 0
idr0013/18774.sql 0
idr0013/18924.sql 0
idr0013/18391.sql 0
idr0013/18401.sql 0
idr0013/18858.sql 0
idr0013/22204.sql 0
idr0013/18580.sql 0
idr0013/18862.sql 0
idr0013/18490.sql 0
idr0013/18936.sql 0
idr0013/18870.sql 0
idr0013/22211.sql 0
idr0013/18828.sql 0
idr0013/22209.sql 0
idr0013/18754.sql 0
idr0013/18465.sql 0
idr0013/18523.sql 0
idr0013/18670.sql 0
idr0013/18579.sql 0
idr0013/18473.sql 0
idr0013/18958.sql 0
idr0013/18577.sql 0
idr0013/18957.sql 0
idr0013/18463.sql 0
idr0013/18589.sql 0
idr0013/18748.sql 0
idr0013/18359.sql 0
idr0013/18354.sql 0
idr0013/18752.sql 0
idr0013/18454.sql 0
idr0013/18824.sql 0
idr0013/18909.sql 0
idr0013/18542.sql 0
idr0013/18403.sql 0
idr0013/18931.sql 0
idr0013/18695.sql 0
idr0013/18489.sql 0
idr0013/18853.sql 0
idr0013/18718.sql 0
idr0013/18358.sql 0
idr0013/18902.sql 0
idr0013/18771.sql 0
idr0013/18604.sql 0
idr0013/18788.sql 0
idr0013/18491.sql 0
idr0013/18700.sql 0
idr0013/18943.sql 0
idr0013/18683.sql 0
idr0013/18846.sql 0
idr0013/22210.sql 0
idr0013/18803.sql 0
idr0013/18918.sql 0
idr0013/18455.sql 0
idr0013/18521.sql 0
idr0013/18844.sql 0
idr0013/18926.sql 0
idr0013/18863.sql 0
idr0013/18843.sql 0
idr0013/18730.sql 0
idr0013/18920.sql 0
idr0013/18585.sql 0
idr0013/18366.sql 0
idr0013/18458.sql 1536
idr0013/18760.sql 1520
idr0013/18804.sql 1520
idr0013/18574.sql 1520

Edited idr0013.csv to contain just the 163 rows with 0 above. Re-ran...

for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
done

Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-05/12/13-09-25.587 for fileset: 18914
will-moore commented 1 year ago

Since idr0138-pilot seems to have much more stable goofys mount, move remaining generation there....

Still to do "idr0013.csv"... on idr0138-pilot... as wmoore user...

idr0013/LT0099_16.ome.zarr,S-BIAD865/2fddf4f4-bbad-490e-9d1a-64f10a911f5f,18716
idr0013/LT0121_09.ome.zarr,S-BIAD865/30078617-8947-451e-b4fc-b5459f8d787d,18787
idr0013/LT0025_54.ome.zarr,S-BIAD865/3068778b-ca4a-409f-8a91-a436aaefd539,18461
idr0013/LT0011_30.ome.zarr,S-BIAD865/3092cb82-f48f-4918-9a2d-a159ff420623,18384
idr0013/LTValidMitosisSon384Plate01_02.ome.zarr,S-BIAD865/3215294d-e302-43e8-a96f-0a0dd44f10a6,22227
idr0013/LT0601_01.ome.zarr,S-BIAD865/32f78fc1-3cb0-4ef5-96ff-a7521a1c5d28,18947
idr0013/LT0066_02.ome.zarr,S-BIAD865/333b0032-273f-470d-be49-b944b4191327,18566
idr0013/LTValidMitosisSon384Plate02_04.ome.zarr,S-BIAD865/33bd6e90-8597-445f-a6a1-6f03216902c1,22222
idr0013/LT0116_47.ome.zarr,S-BIAD865/340f3f55-2286-4fa2-8c01-e049bbd86d5d,18774
idr0013/LT0153_06.ome.zarr,S-BIAD865/34eea383-ae3d-4c39-ad85-127571a58957,18924
idr0013/LT0014_01.ome.zarr,S-BIAD865/350edb2c-befa-4ebd-b130-4f5d88fd18b8,18391
idr0013/LT0016_18.ome.zarr,S-BIAD865/364084b9-d7af-4600-b6e3-0621bd50c563,18401
idr0013/LT0142_01.ome.zarr,S-BIAD865/364309c6-4bd0-469d-ad6b-981cb86ac9c0,18858
idr0013/LTValidMitosisSon384Plate07_01.ome.zarr,S-BIAD865/369313de-98e2-44d7-9362-d9c710ade6dd,22204
idr0013/LT0070_41.ome.zarr,S-BIAD865/36a2e3d5-72e4-4652-af7f-929161e2322d,18580
idr0013/LT0143_02.ome.zarr,S-BIAD865/3726dab9-e7a3-4df2-a594-aaeaa9f94d95,18862
idr0013/LT0033_42.ome.zarr,S-BIAD865/376cff9a-a923-4f19-957e-1c4c644b39c5,18490
idr0013/LT0157_07.ome.zarr,S-BIAD865/381f57c9-d2cc-4e33-a0da-cfff6357d9ae,18936
idr0013/LT0145_02.ome.zarr,S-BIAD865/38689649-4f4c-4983-9840-25e2d5f058a5,18870
idr0013/LTValidMitosisSon384Plate05_02.ome.zarr,S-BIAD865/386a44d6-1132-4d0d-abf7-180764320c63,22211
idr0013/LT0133_19.ome.zarr,S-BIAD865/38e77549-b1d0-4559-918b-85da280e9949,18828
idr0013/LTValidMitosisSon384Plate05_04.ome.zarr,S-BIAD865/3932254b-22f9-487d-80b1-b9c2daa7bf46,22209
idr0013/LT0110_09.ome.zarr,S-BIAD865/39885715-f764-46a1-b045-dd423db83c63,18754
idr0013/LT0026_21.ome.zarr,S-BIAD865/3a0f0b01-39aa-4745-aebd-1719c1796206,18465
idr0013/LT0042_28.ome.zarr,S-BIAD865/3a54eeb7-9e0a-438b-8993-926a9ad10689,18523
idr0013/LT0085_07.ome.zarr,S-BIAD865/3b4f9774-4a00-489d-89a3-0d2aeca87835,18670
idr0013/LT0069_52.ome.zarr,S-BIAD865/3c36a642-5b4f-4c11-8e6d-baa0f4178c9b,18579
idr0013/LT0029_01.ome.zarr,S-BIAD865/3ca89c81-1eea-49ac-b7da-fee5f5f945af,18473
idr0013/LT0603_05.ome.zarr,S-BIAD865/3caeca4e-c69c-4a1a-a98e-bb0f83ee6a0c,18958
idr0013/LT0069_51.ome.zarr,S-BIAD865/3cc6b15c-13b0-417b-a249-57932368b51e,18577
idr0013/LT0603_06.ome.zarr,S-BIAD865/3d461dd5-bec1-43fc-8fc3-2406f1d2bb72,18957
idr0013/LT0025_56.ome.zarr,S-BIAD865/3d4a9c7f-944a-40f0-b872-3da8ff3557ff,18463
idr0013/LT0073_02.ome.zarr,S-BIAD865/3d5ac001-823d-4c7a-83a4-e29c826f81e0,18589
idr0013/LT0108_47.ome.zarr,S-BIAD865/3e550b11-5e87-4587-8b8a-f7653fadab9b,18748
idr0013/LT0003_40.ome.zarr,S-BIAD865/3e7ad301-3cad-413a-9e79-571a691712bf,18359
idr0013/LT0002_02.ome.zarr,S-BIAD865/3e7aeaeb-4de8-42b9-bed3-2c4af89a0bf7,18354
idr0013/LT0110_01.ome.zarr,S-BIAD865/3edb1d3a-91da-48a9-b6a4-592328ea5f1c,18752
idr0013/LT0023_01.ome.zarr,S-BIAD865/40aadbcb-77df-4663-a5f8-29177971b58b,18454
idr0013/LT0132_04.ome.zarr,S-BIAD865/40e83a42-6bc0-4f3b-80f9-80a865ac5424,18824
idr0013/LT0148_37.ome.zarr,S-BIAD865/4251a3eb-043c-4abe-9326-3e2afb9f6e97,18909
idr0013/LT0049_02.ome.zarr,S-BIAD865/427c1e16-5bee-425f-ae65-163a4db18e54,18542
idr0013/LT0016_28.ome.zarr,S-BIAD865/42a137f5-6f48-4873-9d66-fac6367a802b,18403
idr0013/LT0156_07.ome.zarr,S-BIAD865/4399d284-8a5c-47f3-9169-007d2f0cad27,18931
idr0013/LT0093_16.ome.zarr,S-BIAD865/4421634f-208a-4d43-88c8-80b5c8caa056,18695
idr0013/LT0033_11.ome.zarr,S-BIAD865/448ecf99-dba9-4e72-8edb-f8e03453c292,18489
idr0013/LT0140_06.ome.zarr,S-BIAD865/44bff916-3cc4-4f8b-a185-fabcf82b5e01,18853
idr0013/LT0100_09.ome.zarr,S-BIAD865/44c62d0b-c9e8-42e7-97e8-37592f26ba75,18718
idr0013/LT0003_15.ome.zarr,S-BIAD865/44e8361b-2bbd-4f01-ba02-f3333a34a5c4,18358
idr0013/LT0146_06.ome.zarr,S-BIAD865/44f07347-2f3d-4d65-ad1c-6c376577862a,18902
idr0013/LT0116_43.ome.zarr,S-BIAD865/44f3c26d-65e2-43d0-9d2e-75ac4673f210,18771
idr0013/LT0077_01.ome.zarr,S-BIAD865/45eb9b6b-f72f-42cc-b0c8-19923f2c6d92,18604
idr0013/LT0121_37.ome.zarr,S-BIAD865/46b7571c-679a-4aba-ba66-c1e608eb803d,18788
idr0013/LT0034_01.ome.zarr,S-BIAD865/4706dd97-c751-447d-b8ef-6dc9ea68dea7,18491
idr0013/LT0094_44.ome.zarr,S-BIAD865/497cb9a3-4e13-4498-aae5-c4b291515352,18700
idr0013/LT0170_01.ome.zarr,S-BIAD865/4a33abd2-9f15-4ddb-9cc6-faf7cffb4960,18943
idr0013/LT0089_02.ome.zarr,S-BIAD865/4a3ace35-8cb0-459a-8609-c78f99cb79a5,18683
idr0013/LT0138_03.ome.zarr,S-BIAD865/4a96176c-6d36-4ce5-a9d6-ed5cca52cbeb,18846
idr0013/LTValidMitosisSon384Plate05_03.ome.zarr,S-BIAD865/4acc4a36-2066-43ee-9a7f-756733f1e379,22210
idr0013/LT0125_41.ome.zarr,S-BIAD865/4b271b6d-1dd3-4079-9e40-4153e13f56ae,18803
idr0013/LT0151_08.ome.zarr,S-BIAD865/4b390ccd-714f-4452-aae7-5db76302337b,18918
idr0013/LT0023_04.ome.zarr,S-BIAD865/4c512657-5553-41b4-a77c-6df1f562ff05,18455
idr0013/LT0042_10.ome.zarr,S-BIAD865/4c5e7b2b-f19d-4bdd-ae88-1d1bb0c3c869,18521
idr0013/LT0138_01.ome.zarr,S-BIAD865/4f0ab5bc-90f9-474d-8b0d-0f2303f94593,18844
idr0013/LT0154_02.ome.zarr,S-BIAD865/4f84d491-654e-4c1b-b39e-16258cbb7056,18926
idr0013/LT0143_05.ome.zarr,S-BIAD865/4fd3c599-6c09-4a63-bfe8-cc345ea99002,18863
idr0013/LT0137_44.ome.zarr,S-BIAD865/50991552-7af6-40b0-813a-e03bc6590cd1,18843
idr0013/LT0104_04.ome.zarr,S-BIAD865/50be2b3c-b163-4363-9bdd-5be0651f2b03,18730
idr0013/LT0152_04.ome.zarr,S-BIAD865/50f78452-8396-401b-9aeb-d9982ddbca0b,18920
idr0013/LT0072_02.ome.zarr,S-BIAD865/513a062e-2307-40c5-8f6b-57761e9b502f,18585
idr0013/LT0006_10.ome.zarr,S-BIAD865/5147e4d3-bec9-4166-b63a-dbe5f5008f52,18366
will-moore commented 11 months ago

Following Images/Filesets found to be incomplete when regenerating memo files on idr-testing...

On pilot-zarr1-dev, screen

$ screen -r idr0015_ngff
$ cd /data/idr0013
$ conda activate bioformats2raw2
$ for i in LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3 LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4 LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4; do
~/bioformats2raw-0.6.0-24/bin/bioformats2raw --memo-directory /../memo  /uod/idr/metadata/idr0013-neumann-mitocheck/screens/$i.screen $i.ome.zarr; done

Can't seem to read the data...

$ sudo ls /uod/idr/filesets/idr0013-neumann-mitocheck/
ls: cannot open directory /uod/idr/filesets/idr0013-neumann-mitocheck/: Permission denied

EDIT: seems to work when I'm not in that old screen. Created screen -S idr0013_bf2raw and ran again... 10:35...

will-moore commented 11 months ago

Checking that files missing from previous plates are present in newly-generated ones...

This was missing M/1 Well before, but seems to have the same number of files as other Wells now...

(base) [wmoore@pilot-zarr1-dev idr0013]$ find LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/M/1 -type f | wc
    478     478   36242
(base) [wmoore@pilot-zarr1-dev idr0013]$ find LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/M/2 -type f | wc
    478     478   36242
(base) [wmoore@pilot-zarr1-dev idr0013]$ find LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/A/1 -type f | wc
    478     478   36242

Similar checks with the other plates for .zattrs etc and /A all look good...

Renamed to shorten names...

(base) [wmoore@pilot-zarr1-dev idr0013]$ ls -lh 
total 0
drwxrwxr-x. 19 wmoore wmoore 271 Nov 14 14:37 LT0066_23.ome.zarr
drwxrwxr-x. 19 wmoore wmoore 271 Nov 14 12:18 LT0080_37.ome.zarr
drwxrwxr-x. 19 wmoore wmoore 271 Nov 14 13:32 LT0103_13.ome.zarr
$ for i in $(ls); do zip -r $i.zip $i; done
...

EDIT: oops - realised that previous idr0013 plates have full names, not shortened as above. Re-named back to full names and zipped them..

$ md5sum ./*
2dc74001d737bf48841ea4a186391574  LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr.zip
bc35cca08c935c765df6a3d1b1198732  LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr.zip
5ad963825e2e3c5ccc5c2a5060819e7f  LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr.zip

Delete these 3 from https://www.ebi.ac.uk/biostudies/submissions/files?path=%2Fuser%2Fidr0013

Upload...

$ cd .aspera/cli/bin
$ ./ascp -P33001 -i ~/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/idr0013/idr0013 bsaspera_w@hx-fasp-1.ebi.ac.uk:/5f/13xxxxx

LT0066_23--ex2005_08_03--sp2005_06_07--tt17--           100%   24GB  128Mb/s    12:05    
LT0080_37--ex2005_07_20--sp2005_07_04--tt17--            100%   25GB  247Mb/s    26:43    
LT0103_13--ex2006_11_22--sp2005_08_16--tt19--               100%   24GB  377Mb/s    48:59 
will-moore commented 10 months ago

Checked https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD865.html again. Resubmitted plates above not updated yet...

LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3 https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD865/dab29e5a-d36f-430a-a9ff-7a1d6e4ce299/dab29e5a-d36f-430a-a9ff-7a1d6e4ce299.zarr

LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4 https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD865/df947dfe-ed8f-4dda-a20a-fb9f3a717b47/df947dfe-ed8f-4dda-a20a-fb9f3a717b47.zarr

LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4 https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD865/8387705b-16bf-4b14-8884-426b0c16dfff/8387705b-16bf-4b14-8884-426b0c16dfff.zarr

will-moore commented 10 months ago

Let's host those 3 plates on our s3 for testing mkngff etc.

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0013
make_bucket: idr0013
(base) [wmoore@pilot-zarr1-dev idr0013]$ /home/wmoore/mc cp -r idr0013/ uk1s3/idr0013
...tt19--c4.ome.zarr/P/9/0/3/92/0/0/0/0: 102.77 GiB / 102.77 GiB ━━━━━━━━━━━━━━━ 18.86 MiB/s 1h32m59s

Looking good:

On idr0125-pilot...

ssh -A -o 'ProxyCommand ssh idr-pilot.openmicroscopy.org -W %h:%p' idr0125-omeroreadwrite -L 1080:localhost:80

sudo mkdir /idr0013 && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr0013 /idr0013

ls /idr0013
LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr  LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr  LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr

As omero-server user...

idr0013.csv

LT0066_23,LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr,18568
LT0080_37,LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr,18655
LT0103_13,LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr,18728

screen -r mkngff

for r in $(cat $IDRID.csv); do
  zarrpath=$(echo $r | cut -d',' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  omero mkngff sql $fsid --clientpath="https://uk1s3.embassy.ebi.ac.uk/idr0013/$zarrpath" "/idr0013/$zarrpath" > "$IDRID/$fsid.sql"
done
will-moore commented 10 months ago

Check sql output - all have .zarr/.zattrs...

(venv3) (base) bash-4.2$ for i in 18568.sql 18655.sql 18728.sql; do echo $i; cat $i | grep ".zarr/.zattrs" | wc; cat $i | grep ".zattrs" | wc; done
18568.sql
      1       4     258
    762    3048  205148
18655.sql
      1       4     258
    762    3048  205148
18728.sql
      1       4     258
    738    2952  198688

$ less 18568.sql...

UPDATE pixels SET name = 'METADATA.ome.xml', path = 'demo_2/2016-05/03/23-33-31.705_mkngff/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/OME' where image in (select id from Image where fileset = 18568);

begin;
    select mkngff_fileset(
      18568,
      'SECRETUUID',
      'cdf35825-def1-4580-8d0b-9c349b8f78d6',
      'demo_2/2016-05/03/23-33-31.705_mkngff/',
      array[
          ['demo_2/2016-05/03/23-33-31.705_mkngff/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/', '.zattrs', 'application/octet-stream', 'https://uk1s3.embassy.ebi.ac.uk/idr0013/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/.zattrs'],
          ['demo_2/2016-05/03/23-33-31.705_mkngff/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/', '.zgroup', 'application/octet-stream', 'https://uk1s3.embassy.ebi.ac.uk/idr0013/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/.zgroup'],
          ['demo_2/2016-05/03/23-33-31.705_mkngff/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/A/', '.zgroup', 'application/octet-stream', 'https://uk1s3.embassy.ebi.ac.uk/idr0013/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr/A/.zgroup'],
...

Updated SECRET to 9630ba1e-ed3a-42e3-9296-xxxxxxxx then ran

for r in $(cat $IDRID.csv); do
  zarrpath=$(echo $r | cut -d',' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
  omero mkngff symlink /data/OMERO/ManagedRepository "/idr0013/$zarrpath" --bfoptions
done

UPDATE 380
BEGIN
 mkngff_fileset
----------------
        5289227
(1 row)

COMMIT
usage: /opt/omero/server/venv3/bin/omero mkngff symlink [-h] [--bfoptions]
                                                        symlink_repo
                                                        fileset_id
                                                        symlink_target
/opt/omero/server/venv3/bin/omero mkngff symlink: error: argument fileset_id: invalid int value: '/idr0013/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr'
UPDATE 380
BEGIN
 mkngff_fileset
----------------
        5289228
(1 row)

COMMIT
usage: /opt/omero/server/venv3/bin/omero mkngff symlink [-h] [--bfoptions]
                                                        symlink_repo
                                                        fileset_id
                                                        symlink_target
/opt/omero/server/venv3/bin/omero mkngff symlink: error: argument fileset_id: invalid int value: '/idr0013/LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr'
UPDATE 368
BEGIN
 mkngff_fileset
----------------
        5289229
(1 row)

COMMIT
usage: /opt/omero/server/venv3/bin/omero mkngff symlink [-h] [--bfoptions]
                                                        symlink_repo
                                                        fileset_id
                                                        symlink_target
/opt/omero/server/venv3/bin/omero mkngff symlink: error: argument fileset_id: invalid int value: '/idr0013/LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr'

Ooops.... re-ran symlinks....

$ for r in $(cat $IDRID.csv); do
>   zarrpath=$(echo $r | cut -d',' -f2)
>   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
>   echo $zarrpath
>   echo $fsid
>   omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/idr0013/$zarrpath" --bfoptions
> done
LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr
18568
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr -> /idr0013/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr.bfoptions
LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr
18655
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff/LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr -> /idr0013/LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff/LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4.ome.zarr.bfoptions
LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr
18728
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff/LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr -> /idr0013/LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff/LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4.ome.zarr.bfoptions

Fileset info looks good...

(base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ ls -alh /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff
total 12K
drwxr-xr-x.  2 omero-server omero-server  144 Jan  3 11:26 .
drwxr-xr-x. 63 omero-server omero-server 4.0K Jan  3 11:26 ..
lrwxrwxrwx.  1 omero-server omero-server   65 Jan  3 11:26 LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr -> /idr0013/LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr
-rw-r--r--.  1 omero-server omero-server   49 Jan  3 11:26 LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3.ome.zarr.bfoptions

Checking http://localhost:1080/webclient/?show=image-1556033 - view image.... Looks good. Other plates: http://localhost:1080/webclient/?show=image-1573071... and LT0103_13

will-moore commented 10 months ago

Lets check_pixels...

for i in 3669 3669 3828; do
  python check_pixels.py Plate:$i --max-planes=sizeC --max-images=10 >> /tmp/check_pix_20240301_idr0013.log;
done

$ grep Error /tmp/check_pix_20240301_idr0013.log | wc
      0       0       0
will-moore commented 9 months ago

We have re-submitted data now available on EBI s3...

Test on idr-testing, using Fileset IDs from idr-testing!

Install https://github.com/IDR/omero-mkngff/pull/14 to create new Filesets without extra _mkngff suffix... And use --fs_suffix=None below...

pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@fs_suffix'

idr0013.csv

idr0013/LT0080_37.ome.zarr.zip,S-BIAD865/aea4aa32-60c2-4a38-8a91-9f303381e562,6312927
idr0013/LT0066_23.ome.zarr.zip,S-BIAD865/c1d9f06e-cfd0-43cd-be2f-3e5f39c3b62a,6313098
idr0013/LT0103_13.ome.zarr.zip,S-BIAD865/eae9bb4c-9504-4f88-9931-dbf234f86023,6313107
export IDRID-idr0013
for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  omero mkngff sql $fsid --fs_suffix=None --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr" "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
done

Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-05/07/02-36-52.924_mkngff for fileset: 6312927
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-05/03/23-33-31.705_mkngff for fileset: 6313098
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-05/08/17-02-05.805_mkngff for fileset: 6313107

Then, update SECRET and... (again using --fs_suffix=None)...

for i in $(ls); do sed -i 's/SECRETUUID/f464e059-16b5-4013-b9a2-417e5976371c/g' $i; done

for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
  omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr" --fs_suffix=None --bfoptions
done

UPDATE 380
BEGIN
 mkngff_fileset 
----------------
        6314896
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff/aea4aa32-60c2-4a38-8a91-9f303381e562.zarr -> /bia-integrator-data/S-BIAD865/aea4aa32-60c2-4a38-8a91-9f303381e562/aea4aa32-60c2-4a38-8a91-9f303381e562.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff/aea4aa32-60c2-4a38-8a91-9f303381e562.zarr.bfoptions
UPDATE 380
BEGIN
 mkngff_fileset 
----------------
        6314897
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff/c1d9f06e-cfd0-43cd-be2f-3e5f39c3b62a.zarr -> /bia-integrator-data/S-BIAD865/c1d9f06e-cfd0-43cd-be2f-3e5f39c3b62a/c1d9f06e-cfd0-43cd-be2f-3e5f39c3b62a.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff/c1d9f06e-cfd0-43cd-be2f-3e5f39c3b62a.zarr.bfoptions
UPDATE 368
BEGIN
 mkngff_fileset 
----------------
        6314898
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff/eae9bb4c-9504-4f88-9931-dbf234f86023.zarr -> /bia-integrator-data/S-BIAD865/eae9bb4c-9504-4f88-9931-dbf234f86023/eae9bb4c-9504-4f88-9931-dbf234f86023.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff/eae9bb4c-9504-4f88-9931-dbf234f86023.zarr.bfoptions

http://localhost:1080/webclient/?show=image-1600787...

will-moore commented 7 months ago

Updated sql scripts to use original Fileset IDs in https://github.com/IDR/mkngff_upgrade_scripts/commit/3f8e1693ebbb5032ec81e0c63168e99c1be633b8