IDR / idr-metadata

Curated metadata for all studies published in the Image Data Resource
https://idr.openmicroscopy.org
14 stars 24 forks source link

idr0090-ashdown-malaria S-BIAD882 #651

Open will-moore opened 1 year ago

will-moore commented 1 year ago

idr0090-ashdown-malaria

dominikl commented 1 year ago

Running out of diskspace on pilot-zarr2-dev... Going to try to convert on pilot-idrtesting.

dominikl commented 1 year ago

Same on pilot-idrtersting. Dont know how much diskspace I'd need, even nearly 1Tb isnt enough.

will-moore commented 1 year ago

Thanks @dominikl for these conversions...

$ ssh pilot-zarr1-dev
ls -alh /data/idr0090
total 96K
drwxrwxr-x.  4 dlindner dlindner  89 Apr 12 10:10 .
drwxrwxr-x. 16 root     idr-data 289 Apr  6 15:02 ..
drwxrwxr-x.  5 dlindner dlindner  89 Apr 12 10:58 190211.ome.zarr
drwxrwxr-x. 10 dlindner dlindner 154 Feb 24 19:37 190213.ome.zarr
-rw-rw-r--.  1 dlindner dlindner 95K Feb 24 13:18 190213.screen

Plate named "190211" is a sparse plate: https://idr.openmicroscopy.org/webclient/?show=plate-9303

will-moore commented 1 year ago

Make bucket...

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0090
make_bucket: idr0090
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0090 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0090  --cors-configuration file://cors.json

Upload 1 plate...

# pilot-zarr1-dev
(base) [wmoore@pilot-zarr1-dev data]$ /home/wmoore/mc cp -r idr0090/190213.ome.zarr/ uk1s3/idr0090/zarr/190213.ome.zarr
...

1.02 TiB

https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0090/zarr/190213.ome.zarr&well=all

https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0090/zarr/190213.ome.zarr

Image

will-moore commented 1 year ago

So far we have 2 plates on pilot-zarr1-dev. Zipping (with -m to remove originals)...

$ screen -S idr0090_zip
$ ls -lh /data/idr0090
total 0
drwxrwxr-x. 10 dlindner dlindner 154 Apr 12 18:34 190211.ome.zarr
drwxrwxr-x. 10 dlindner dlindner 154 Feb 24 19:37 190213.ome.zarr

$ cd /data/idr0090
$ for i in */; do zip -mr "${i%/}.zip" "$i"; done
will-moore commented 1 year ago

Doh! Got permission denied! sudo...

$ for i in */; do sudo zip -mr "${i%/}.zip" "$i"; done
will-moore commented 1 year ago

Current log (21 hours later...)

  adding: 190211.ome.zarr/B/9/3/0/0/3/23/0/1 (deflated 38%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/ (stored 0%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/0 (deflated 36%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/1 (deflated 37%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/24/ (stored 0%)
  adding: 190211.ome.zarr/B/9/3/0/0/3/24/0/ (stored 0%)
will-moore commented 1 year ago

zip still running...27 hours... Not half-way yet!! - This is on Well 14/31 for that plate: https://idr.openmicroscopy.org/webclient/?show=plate-9303

)
  adding: 190211.ome.zarr/C/2/13/0/0/0/13/1/0 (deflated 26%)
  adding: 190211.ome.zarr/C/2/13/0/0/0/13/1/1 (deflated 24%)
  adding: 190211.ome.zarr/C/2/13/0/0/0/14/ (stored 0%)
  adding: 190211.ome.zarr/C/2/13/0/0/0/14/0/ (stored 0
dominikl commented 1 year ago

Conversion takes 34 hours / plate.

will-moore commented 1 year ago

Installed p7zip on pilot-zarr1-dev:

(base) [wmoore@pilot-zarr1-dev ~]$ sudo yum install p7zip
(base) [wmoore@pilot-zarr1-dev ~]$ which 7za
/usr/bin/7za
will-moore commented 1 year ago

Cancelled the previous zip process (still less than halfway through). Hopefully enough space to zip, then upload and delete...

(base) [wmoore@pilot-zarr1-dev idr0090]$ df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        4.9T  3.3T  1.7T  67% /data

$ screen -r idr0090_zip
$ cd /data/idr0090
$ 7za a 190213.ome.zarr.zip 190213.ome.zarr
will-moore commented 1 year ago

Wow, finally completed zipping one plate... started upload

(base) [wmoore@pilot-zarr1-dev idr0090]$ screen -S idr0090_zip

$ sudo 7za a 190213.ome.zarr.zip 190213.ome.zarr

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel Xeon Processor (Cascadelake) (50655),ASM,AES-NI)

Scanning the drive:
1048096 folders, 803034 files, 1120438944155 bytes (1044 GiB)

Creating archive: 190213.ome.zarr.zip

Items to compress: 1851130

Files read from disk: 803034
Archive size: 752178841171 bytes (701 GiB)
Everything is Ok

upload...

$ screen -r idr0090_zip
$ cd .aspera/cli/bin
$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0090/idr0090 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/136exxxxxx

delete (might take a while)...

$ screen -S idr0090_rm
$ sudo rm -rf 190213.ome.zarr
will-moore commented 1 year ago

Unfortunately upload timed-out. Needs about 7 hours to upload!

(base) [wmoore@pilot-zarr1-dev bin]$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0090/idr0090 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/13xxxxxxx
190213.ome.zarr.zip                                                                                                                  10%   75GB  250Mb/s  6:04:22 ETA
Partial Completion: 79044314K bytes transferred in 2697 seconds
 (240063K bits/sec), in 1 file, 1 directory; 1 file failed.

Session Stop  (Error: Session data transfer timeout (server), Session data transfer timeout)
will-moore commented 1 year ago

@dominikl I've cleaned-up space I've been using on pilot-zarr1-dev. I don't have anything important there now, except the idr0090 plate and zplate.zip, so feel free to delete anything else you need. Still quite a bit of space used - not sure where (apart from idr0090).

(base) [wmoore@pilot-zarr1-dev data]$ df -h /data/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        4.9T  2.6T  2.4T  52% /data
dominikl commented 1 year ago

The upload seems to be a problem indeed. Just got the session timeout as well. I'll have a look if it's possible to split the zip into maybe 10 parts so that they're < 100GB.

dominikl commented 1 year ago

Creating 100Gb chunks now, with -v100g . But there's another problem, 190129 is failing with a NPE

(base) [dlindner@pilot-zarr1-dev idr0090]$ /home/dlindner/bioformats2raw/bin/bioformats2raw --memo-directory ../memo /uod/idr/metadata/idr0090-ashdown-malaria/screens/190129.screen 190129.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp7557699430086545059/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/dlindner/bioformats2raw/lib/kryo-2.24.0.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@16150369): java.lang.NullPointerException
        at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
        at picocli.CommandLine.call(CommandLine.java:2761)
        at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2192)
Caused by: java.lang.NullPointerException
will-moore commented 1 year ago

Oh dear! I don't know if BioStudies will handle multiple zips correctly - e.g. unzip them into a single Fileset. Might need to contact them and ask for advice?

dominikl commented 1 year ago

I did, on bia-idr channel, but no reply yet. I can't see why this should be a problem. You simply extract it using the first volume and it figures the other volumen files out itself:

Extracting archive: 190206.ome.zarr.zip.001
--
Path = 190206.ome.zarr.zip.001
Type = Split
Physical Size = 107374182400
Volumes = 7
Total Physical Size = 697753098234
----
Path = 190206.ome.zarr.zip
Size = 697753098234
dominikl commented 1 year ago

I'll collect the failed plates here:

dominikl commented 1 year ago

Just can't zip the last plate...

(base) [dlindner@pilot-zarr1-dev idr0090]$ 7za -v100g  a 190904.ome.zarr.zip 190904.ome.zarr

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel Xeon Processor (Cascadelake) (50655),ASM,AES-NI)

Scanning the drive:
3007563 folders, 2304336 files, 3091971823339 bytes (2880 GiB)

Creating archive: 190904.ome.zarr.zip

Items to compress: 5311899

System ERROR:
E_FAIL
dominikl commented 1 year ago

Ah, there's not enough disk space probably...

dominikl commented 1 year ago

1.7 Tb free now, should be enough. Using idr-ftp / idr-testing to export the two failed plates 190129 and 190227 with omero-cli-zarr.

dominikl commented 1 year ago

Ok, it looks like this plate actually is nearly 3Tb... will copy it over to idrftp to do the zipping there.

dominikl commented 1 year ago

Everything's uploaded now. Also updated idr0090_files.tsv to include ImageID column as the zip files are split into 100gb chunks.

will-moore commented 1 year ago

Running on idr0125-pilot as wmoore...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do
>   biapath=$(echo $r | cut -d',' -f2)
>   uuid=$(echo $biapath | cut -d'/' -f2)
>   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
>   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
> done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-11/2021-02/20/06-09-40.395 for fileset: 4782270
...
will-moore commented 1 year ago

goofys failed... 9/22 exported. 13 to go...

remounted, edited idr0013.csv and re-ran...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]');   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/19/19-38-35.684 for fileset: 4782261
...
will-moore commented 1 year ago

Goofys failed again. 6 more sql generated, 7 still to go...

Going to replace goofys with geesefs. Already installed on idr0125-pilot at https://github.com/IDR/omero-mkngff/issues/2#issuecomment-1750512143

Now mount at same URL instead of goofys...

sudo umount /bia-integrator-geesefs
sudo umount /bia-integrator-data
sudo /opt/geesefs --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
s3.INFO anonymous bucket detected
main.INFO File system has been successfully mounted.

Restarted mkngff...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]');   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/18/20-50-17.861 for fileset: 4782251
...
will-moore commented 1 year ago

Server restart (idr.openmincroscopy.org release) after 1 Filseset... Restart...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]');   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-6/2021-02/19/12-14-48.182 for fileset: 4782256
...
will-moore commented 1 year ago

All done:

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ ls -alh idr0090
total 33M
drwxrwxr-x.  2 wmoore wmoore 4.0K Oct  6 16:58 .
drwx------. 24 wmoore wmoore 4.0K Oct  9 08:44 ..
-rw-rw-r--.  1 wmoore wmoore 3.8M Oct  9 07:36 4782251.sql
-rw-rw-r--.  1 wmoore wmoore 1.4M Oct  6 16:01 4782252.sql
-rw-rw-r--.  1 wmoore wmoore 1.8M Oct  6 16:30 4782253.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  9 09:53 4782254.sql
-rw-rw-r--.  1 wmoore wmoore 1.8M Oct  9 11:25 4782255.sql
-rw-rw-r--.  1 wmoore wmoore 2.4M Oct  9 09:25 4782256.sql
-rw-rw-r--.  1 wmoore wmoore 516K Oct  6 16:46 4782257.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  9 11:56 4782258.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  6 16:38 4782259.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  8 22:21 4782260.sql
-rw-rw-r--.  1 wmoore wmoore 1.6M Oct  8 22:01 4782261.sql
-rw-rw-r--.  1 wmoore wmoore 1.6M Oct  9 10:47 4782262.sql
-rw-rw-r--.  1 wmoore wmoore 1.2M Oct  8 22:08 4782263.sql
-rw-rw-r--.  1 wmoore wmoore 1.2M Oct  8 22:27 4782264.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  6 16:19 4782265.sql
-rw-rw-r--.  1 wmoore wmoore 1.3M Oct  6 16:10 4782266.sql
-rw-rw-r--.  1 wmoore wmoore 866K Oct  9 10:12 4782267.sql
-rw-rw-r--.  1 wmoore wmoore 866K Oct  8 22:13 4782268.sql
-rw-rw-r--.  1 wmoore wmoore 860K Oct  6 16:43 4782269.sql
-rw-rw-r--.  1 wmoore wmoore 866K Oct  6 15:53 4782270.sql
-rw-rw-r--.  1 wmoore wmoore 518K Oct  6 16:49 4782271.sql
-rw-rw-r--.  1 wmoore wmoore 3.8M Oct  8 22:50 4782272.sql
will-moore commented 1 year ago

Running sql etc on idr0125-pilot.

$ psql -U omero -d idr -h $DBHOST -c "select uuid from (select * from session where node = 0 and owner = 0 and defaulteventtype = 'Sessions' order by id desc limit 1) x order by x.id asc limit 1;"
                 uuid                 
--------------------------------------
 2703680e-9e33-49b0-8fea-9f7c17df16d7

Copied idr0090 sqls to omero-server.

for i in $(ls); do sed -i 's/SECRETUUID/2703680e-9e33-49b0-8fea-9f7c17df16d7/g' $i; done

$ for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
  omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr"
done
...

UPDATE 736
BEGIN
 mkngff_fileset 
----------------
        5288269
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151_mkngff/eba197df-ea03-4465-8855-2e9bde0db414.zarr -> /bia-integrator-data/S-BIAD882/eba197df-ea03-4465-8855-2e9bde0db414/eba197df-ea03-4465-8855-2e9bde0db414.zarr

Try viewing a smaller plate...(from bioformats2raw) http://localhost:1040/webclient/?show=image-12545749