Open will-moore opened 1 year ago
Running out of diskspace on pilot-zarr2-dev... Going to try to convert on pilot-idrtesting.
Same on pilot-idrtersting. Dont know how much diskspace I'd need, even nearly 1Tb isnt enough.
Thanks @dominikl for these conversions...
$ ssh pilot-zarr1-dev
ls -alh /data/idr0090
total 96K
drwxrwxr-x. 4 dlindner dlindner 89 Apr 12 10:10 .
drwxrwxr-x. 16 root idr-data 289 Apr 6 15:02 ..
drwxrwxr-x. 5 dlindner dlindner 89 Apr 12 10:58 190211.ome.zarr
drwxrwxr-x. 10 dlindner dlindner 154 Feb 24 19:37 190213.ome.zarr
-rw-rw-r--. 1 dlindner dlindner 95K Feb 24 13:18 190213.screen
Plate named "190211" is a sparse plate: https://idr.openmicroscopy.org/webclient/?show=plate-9303
Make bucket...
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0090
make_bucket: idr0090
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0090 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0090 --cors-configuration file://cors.json
Upload 1 plate...
# pilot-zarr1-dev
(base) [wmoore@pilot-zarr1-dev data]$ /home/wmoore/mc cp -r idr0090/190213.ome.zarr/ uk1s3/idr0090/zarr/190213.ome.zarr
...
1.02 TiB
So far we have 2 plates on pilot-zarr1-dev
.
Zipping (with -m
to remove originals)...
$ screen -S idr0090_zip
$ ls -lh /data/idr0090
total 0
drwxrwxr-x. 10 dlindner dlindner 154 Apr 12 18:34 190211.ome.zarr
drwxrwxr-x. 10 dlindner dlindner 154 Feb 24 19:37 190213.ome.zarr
$ cd /data/idr0090
$ for i in */; do zip -mr "${i%/}.zip" "$i"; done
Doh! Got permission denied! sudo...
$ for i in */; do sudo zip -mr "${i%/}.zip" "$i"; done
Current log (21 hours later...)
adding: 190211.ome.zarr/B/9/3/0/0/3/23/0/1 (deflated 38%)
adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/ (stored 0%)
adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/0 (deflated 36%)
adding: 190211.ome.zarr/B/9/3/0/0/3/23/1/1 (deflated 37%)
adding: 190211.ome.zarr/B/9/3/0/0/3/24/ (stored 0%)
adding: 190211.ome.zarr/B/9/3/0/0/3/24/0/ (stored 0%)
zip still running...27 hours... Not half-way yet!! - This is on Well 14/31 for that plate: https://idr.openmicroscopy.org/webclient/?show=plate-9303
)
adding: 190211.ome.zarr/C/2/13/0/0/0/13/1/0 (deflated 26%)
adding: 190211.ome.zarr/C/2/13/0/0/0/13/1/1 (deflated 24%)
adding: 190211.ome.zarr/C/2/13/0/0/0/14/ (stored 0%)
adding: 190211.ome.zarr/C/2/13/0/0/0/14/0/ (stored 0
Conversion takes 34 hours / plate.
Installed p7zip
on pilot-zarr1-dev:
(base) [wmoore@pilot-zarr1-dev ~]$ sudo yum install p7zip
(base) [wmoore@pilot-zarr1-dev ~]$ which 7za
/usr/bin/7za
Cancelled the previous zip process (still less than halfway through). Hopefully enough space to zip, then upload and delete...
(base) [wmoore@pilot-zarr1-dev idr0090]$ df -h /data
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 4.9T 3.3T 1.7T 67% /data
$ screen -r idr0090_zip
$ cd /data/idr0090
$ 7za a 190213.ome.zarr.zip 190213.ome.zarr
Wow, finally completed zipping one plate... started upload
(base) [wmoore@pilot-zarr1-dev idr0090]$ screen -S idr0090_zip
$ sudo 7za a 190213.ome.zarr.zip 190213.ome.zarr
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel Xeon Processor (Cascadelake) (50655),ASM,AES-NI)
Scanning the drive:
1048096 folders, 803034 files, 1120438944155 bytes (1044 GiB)
Creating archive: 190213.ome.zarr.zip
Items to compress: 1851130
Files read from disk: 803034
Archive size: 752178841171 bytes (701 GiB)
Everything is Ok
upload...
$ screen -r idr0090_zip
$ cd .aspera/cli/bin
$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0090/idr0090 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/136exxxxxx
delete (might take a while)...
$ screen -S idr0090_rm
$ sudo rm -rf 190213.ome.zarr
Unfortunately upload timed-out. Needs about 7 hours to upload!
(base) [wmoore@pilot-zarr1-dev bin]$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0090/idr0090 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/13xxxxxxx
190213.ome.zarr.zip 10% 75GB 250Mb/s 6:04:22 ETA
Partial Completion: 79044314K bytes transferred in 2697 seconds
(240063K bits/sec), in 1 file, 1 directory; 1 file failed.
Session Stop (Error: Session data transfer timeout (server), Session data transfer timeout)
@dominikl I've cleaned-up space I've been using on pilot-zarr1-dev
.
I don't have anything important there now, except the idr0090 plate and zplate.zip, so feel free to delete anything else you need.
Still quite a bit of space used - not sure where (apart from idr0090).
(base) [wmoore@pilot-zarr1-dev data]$ df -h /data/
Filesystem Size Used Avail Use% Mounted on
/dev/vdb 4.9T 2.6T 2.4T 52% /data
The upload seems to be a problem indeed. Just got the session timeout as well. I'll have a look if it's possible to split the zip into maybe 10 parts so that they're < 100GB.
Creating 100Gb chunks now, with -v100g
. But there's another problem, 190129 is failing with a NPE
(base) [dlindner@pilot-zarr1-dev idr0090]$ /home/dlindner/bioformats2raw/bin/bioformats2raw --memo-directory ../memo /uod/idr/metadata/idr0090-ashdown-malaria/screens/190129.screen 190129.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp7557699430086545059/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/dlindner/bioformats2raw/lib/kryo-2.24.0.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@16150369): java.lang.NullPointerException
at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
at picocli.CommandLine.call(CommandLine.java:2761)
at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:2192)
Caused by: java.lang.NullPointerException
Oh dear! I don't know if BioStudies will handle multiple zips correctly - e.g. unzip them into a single Fileset. Might need to contact them and ask for advice?
I did, on bia-idr channel, but no reply yet. I can't see why this should be a problem. You simply extract it using the first volume and it figures the other volumen files out itself:
Extracting archive: 190206.ome.zarr.zip.001
--
Path = 190206.ome.zarr.zip.001
Type = Split
Physical Size = 107374182400
Volumes = 7
Total Physical Size = 697753098234
----
Path = 190206.ome.zarr.zip
Size = 697753098234
I'll collect the failed plates here:
Just can't zip the last plate...
(base) [dlindner@pilot-zarr1-dev idr0090]$ 7za -v100g a 190904.ome.zarr.zip 190904.ome.zarr
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel Xeon Processor (Cascadelake) (50655),ASM,AES-NI)
Scanning the drive:
3007563 folders, 2304336 files, 3091971823339 bytes (2880 GiB)
Creating archive: 190904.ome.zarr.zip
Items to compress: 5311899
System ERROR:
E_FAIL
Ah, there's not enough disk space probably...
1.7 Tb free now, should be enough. Using idr-ftp / idr-testing to export the two failed plates 190129 and 190227 with omero-cli-zarr.
Ok, it looks like this plate actually is nearly 3Tb... will copy it over to idrftp to do the zipping there.
Everything's uploaded now. Also updated idr0090_files.tsv to include ImageID column as the zip files are split into 100gb chunks.
Running on idr0125-pilot
as wmoore...
(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do
> biapath=$(echo $r | cut -d',' -f2)
> uuid=$(echo $biapath | cut -d'/' -f2)
> fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
> omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
> done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-11/2021-02/20/06-09-40.395 for fileset: 4782270
...
goofys failed... 9/22 exported. 13 to go...
remounted, edited idr0013.csv and re-ran...
(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do biapath=$(echo $r | cut -d',' -f2); uuid=$(echo $biapath | cut -d'/' -f2); fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]'); omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/19/19-38-35.684 for fileset: 4782261
...
Goofys failed again. 6 more sql generated, 7 still to go...
Going to replace goofys with geesefs. Already installed on idr0125-pilot at https://github.com/IDR/omero-mkngff/issues/2#issuecomment-1750512143
Now mount at same URL instead of goofys...
sudo umount /bia-integrator-geesefs
sudo umount /bia-integrator-data
sudo /opt/geesefs --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
s3.INFO anonymous bucket detected
main.INFO File system has been successfully mounted.
Restarted mkngff...
(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do biapath=$(echo $r | cut -d',' -f2); uuid=$(echo $biapath | cut -d'/' -f2); fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]'); omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-5/2021-02/18/20-50-17.861 for fileset: 4782251
...
Server restart (idr.openmincroscopy.org release) after 1 Filseset... Restart...
(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do biapath=$(echo $r | cut -d',' -f2); uuid=$(echo $biapath | cut -d'/' -f2); fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]'); omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/Blitz-0-Ice.ThreadPool.Server-6/2021-02/19/12-14-48.182 for fileset: 4782256
...
All done:
(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ ls -alh idr0090
total 33M
drwxrwxr-x. 2 wmoore wmoore 4.0K Oct 6 16:58 .
drwx------. 24 wmoore wmoore 4.0K Oct 9 08:44 ..
-rw-rw-r--. 1 wmoore wmoore 3.8M Oct 9 07:36 4782251.sql
-rw-rw-r--. 1 wmoore wmoore 1.4M Oct 6 16:01 4782252.sql
-rw-rw-r--. 1 wmoore wmoore 1.8M Oct 6 16:30 4782253.sql
-rw-rw-r--. 1 wmoore wmoore 1.3M Oct 9 09:53 4782254.sql
-rw-rw-r--. 1 wmoore wmoore 1.8M Oct 9 11:25 4782255.sql
-rw-rw-r--. 1 wmoore wmoore 2.4M Oct 9 09:25 4782256.sql
-rw-rw-r--. 1 wmoore wmoore 516K Oct 6 16:46 4782257.sql
-rw-rw-r--. 1 wmoore wmoore 1.3M Oct 9 11:56 4782258.sql
-rw-rw-r--. 1 wmoore wmoore 1.3M Oct 6 16:38 4782259.sql
-rw-rw-r--. 1 wmoore wmoore 1.3M Oct 8 22:21 4782260.sql
-rw-rw-r--. 1 wmoore wmoore 1.6M Oct 8 22:01 4782261.sql
-rw-rw-r--. 1 wmoore wmoore 1.6M Oct 9 10:47 4782262.sql
-rw-rw-r--. 1 wmoore wmoore 1.2M Oct 8 22:08 4782263.sql
-rw-rw-r--. 1 wmoore wmoore 1.2M Oct 8 22:27 4782264.sql
-rw-rw-r--. 1 wmoore wmoore 1.3M Oct 6 16:19 4782265.sql
-rw-rw-r--. 1 wmoore wmoore 1.3M Oct 6 16:10 4782266.sql
-rw-rw-r--. 1 wmoore wmoore 866K Oct 9 10:12 4782267.sql
-rw-rw-r--. 1 wmoore wmoore 866K Oct 8 22:13 4782268.sql
-rw-rw-r--. 1 wmoore wmoore 860K Oct 6 16:43 4782269.sql
-rw-rw-r--. 1 wmoore wmoore 866K Oct 6 15:53 4782270.sql
-rw-rw-r--. 1 wmoore wmoore 518K Oct 6 16:49 4782271.sql
-rw-rw-r--. 1 wmoore wmoore 3.8M Oct 8 22:50 4782272.sql
Running sql etc on idr0125-pilot.
$ psql -U omero -d idr -h $DBHOST -c "select uuid from (select * from session where node = 0 and owner = 0 and defaulteventtype = 'Sessions' order by id desc limit 1) x order by x.id asc limit 1;"
uuid
--------------------------------------
2703680e-9e33-49b0-8fea-9f7c17df16d7
Copied idr0090 sqls to omero-server.
for i in $(ls); do sed -i 's/SECRETUUID/2703680e-9e33-49b0-8fea-9f7c17df16d7/g' $i; done
$ for r in $(cat $IDRID.csv); do
biapath=$(echo $r | cut -d',' -f2)
uuid=$(echo $biapath | cut -d'/' -f2)
fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr"
done
...
UPDATE 736
BEGIN
mkngff_fileset
----------------
5288269
(1 row)
COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-7/2021-02/19/15-02-58.151_mkngff/eba197df-ea03-4465-8855-2e9bde0db414.zarr -> /bia-integrator-data/S-BIAD882/eba197df-ea03-4465-8855-2e9bde0db414/eba197df-ea03-4465-8855-2e9bde0db414.zarr
Try viewing a smaller plate...(from bioformats2raw) http://localhost:1040/webclient/?show=image-12545749
idr0090-ashdown-malaria