IDR / idr-metadata

Curated metadata for all studies published in the Image Data Resource
https://idr.openmicroscopy.org
14 stars 24 forks source link

idr0016-wawer-bioactivecompoundprofiling S-BIAD851 #638

Open dominikl opened 1 year ago

dominikl commented 1 year ago

https://github.com/IDR/idr0016-wawer-bioactivecompoundprofiling

Sample plate conversion failed with:

(base) [dlindner@pilot-zarr2-dev idr0016]$ time /home/dlindner/bioformats2raw/bin/bioformats2raw --memo-directory ../memo 24320.screen 24320.ome.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp6586654250319720590/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
[Fatal Error] :1:84: Character reference "&#0" is an invalid XML character.
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@63a65a25): java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 84; Character reference "&#0" is an invalid XML character.
        at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
        at picocli.CommandLine.call(CommandLine.java:2761)

This error Character reference "&#0" is an invalid XML character is already referenced by https://github.com/IDR/bioformats/issues/29 .

sbesson commented 1 year ago

Note that this study will have the same caveats as of https://github.com/IDR/idr-metadata/issues/640#issuecomment-1552697868 in terms of channel order. So similar decisions will need to be done in terms of the conversion we want to perform.

will-moore commented 1 year ago

Since we've decided to use omero-cli-zarr for idr0036 https://github.com/IDR/idr-metadata/issues/640#issuecomment-1611108909 we should do the same here...

will-moore commented 1 year ago

Going to try on a different machine since pilot-zarr1-dev and pilot-zarr2-dev are at capacity...

Update to use https://github.com/ome/omero-cli-zarr/pull/146

$ ssh -A ome-zarr-dev1.openmicroscopy.org
$ conda activate omero_zarr_export
$ pip uninstall omero-cli-zarr
$ pip install git+https://github.com/will-moore/omero-cli-zarr.git@fix_downsample_image_path
...
Successfully installed omero-cli-zarr-0.1.dev451+g983576f

Just use my home dir...

$ df -h ./
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-root  994G   34G  960G   4% /

Listing all 413 Plate IDs:

6208 4907 4906 6151 6210 6153 6154 6155 4908 6156 6157 6158 6159 4909 6160 6161 4911 6162 6163 6164 6165 6166 6167 4910 6168 6169 4912 6170 4913 6171 6172 4914 6173 6174 6175 6176 6177 6178 6179 4915 4917 4916 4951 4953 4952 4954 4956 4955 4958 4959 4957 6180 4962 4961 4960 4963 4964 4965 4966 4967 4968 4969 4970 4971 4973 4972 4974 4975 4976 4977 4978 6181 6182 6183 6184 6185 6186 4979 4980 6187 6188 6189 6190 6191 6192 6193 6194 4981 6195 6196 6197 6198 6199 6200 4982 6201 6202 6203 6204 6205 6206 6207 4983 4984 4986 4985 4987 4989 4988 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5001 5000 5002 5004 5003 5005 5006 5007 5008 5010 5009 5011 5012 5014 5013 5015 5017 5016 5019 5018 5020 5021 5023 5022 5024 5025 5026 5029 5027 5028 5032 5031 5030 5033 5035 5034 5036 5037 5038 5039 5040 5041 5042 5044 5043 5047 5046 5045 5050 5048 5049 5052 5051 5053 5054 5056 5055 5059 5058 5057 5062 5060 5061 5063 5065 5064 5066 5068 5067 5069 5071 5070 5072 5074 5073 5075 5076 5077 5080 5079 5078 5081 5082 5083 5084 5085 5086 5087 5088 5089 5091 5090 5092 5094 5093 5095 5096 5097 5098 5101 5100 5099 5102 5103 5104 5105 5106 5107 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5251 5250 5252 5253 5254 5255 5256 5257 5259 5258 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5351 5303 5304 5305 5306 5307 5308 5380 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378

start with export of 10 Plates.

screen -S idr0016_ngff
mkdir idr0016 && cd idr0016
omero login.   # idr-testing public/public
for id in 6208 4907 4906 6151 6210 6153 6154 6155 4908 6156; do
  echo $id;
  omero zarr export Plate:$id;
done
will-moore commented 1 year ago

3 plates completed so far... Zipping...

for i in */; do zip -mr "${i%/}.zip" "$i"; done
will-moore commented 1 year ago

Ooops - forgot to rename from plateID.zarr to plateName.ome.zarr as I did for idr0036... Unzipped each of 3 zips and renamed...

$ ls -lh
total 0
drwxr-xr-x 18 wmoore lsd 180 Jul 10 17:20 24277.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 10 19:51 24278.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 10 22:14 24279.ome.zarr

Then zipped again...

will-moore commented 1 year ago

Upload failed...

(base) [wmoore@ome-zarr-dev1 bin]$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d ~/idr0016 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/136e8d-xxxxxx
24277.ome.zarr.zip                                                           95% 5277MB 97.8Mb/s    00:25 ETAPartial Completion: 5414415K bytes transferred in 517 seconds
 (85719K bits/sec), in 3 files, 1 directory; 3 files failed.

Session Stop  (Error: Session data transfer timeout (server), Session data transfer timeout)

Deleted zips on BioStudies and ran again...

will-moore commented 1 year ago

Uploaded 1 zip, then timed-out on next one. Repeated again - for each zip (only uploaded 1 at a time before time-out). Last one uploaded with:

(base) [wmoore@ome-zarr-dev1 bin]$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d ~/idr0016 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/136e8d-xxxxxxxxxx
24279.ome.zarr.zip                        100% 6188MB 83.4Mb/s    08:51    
Completed: 6336955K bytes transferred in 532 seconds
 (97506K bits/sec), in 1 file, 1 directory.
will-moore commented 1 year ago

Testing on s3...

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0016
make_bucket: idr0016
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0016 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0016  --cors-configuration file://cors.json
$ cd /idr0016
$ unzip 24279.ome.zarr.zip && rm 24279.ome.zarr.zip
$ cd
$ ./mc cp -r idr0016/ uk1s3/idr0016/zarr
...79.ome.zarr/P/9/5/3/4/0/0: 6.61 GiB / 6.61 GiB ━━━━━━━━━━━━━━━━━ 24.22 MiB/s 4m39s

Looks good and valid... https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0016/zarr/24279.ome.zarr

https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0016/zarr/24279.ome.zarr

Screenshot 2023-07-11 at 05 58 20
will-moore commented 1 year ago

Above export still running... Start another, also using idr-testing (same session). Use https://github.com/ome/omero-cli-zarr/pull/147 so we don't have to manually rename every plate after export...

batch2 of 100 plates...

screen -S idr0016_export2
conda activate omero_zarr_export
pip install git+https://github.com/will-moore/omero-cli-zarr.git@name_option
cd idr0016
mkdir batch2
cd batch2
for id in 6157 6158 6159 4909 6160 6161 4911 6162 6163 6164 6165 6166 6167 4910 6168 6169 4912 6170 4913 6171 6172 4914 6173 6174 6175 6176 6177 6178 6179 4915 4917 4916 4951 4953 4952 4954 4956 4955 4958 4959 4957 6180 4962 4961 4960 4963 4964 4965 4966 4967 4968 4969 4970 4971 4973 4972 4974 4975 4976 4977 4978 6181 6182 6183 6184 6185 6186 4979 4980 6187 6188 6189 6190 6191 6192 6193 6194 4981 6195 6196 6197 6198 6199 6200 4982 6201 6202 6203 6204 6205 6206 6207 4983 4984 4986 4985 4987 4989 4988 4990; do
  echo $id;
  omero zarr export Plate:$id --name_by name;
done
will-moore commented 1 year ago

Lost connection with IDR part-way through initial 10 plates (and batch2 of 100 plates)... Restarted, repeating the part-exported plate and the remaining 3 of 10...

ssh -A ome-zarr-dev1.openmicroscopy.org
screen -S idr0016_export
for id in 6154 6155 4908 6156; do
  echo $id;
  omero zarr export Plate:$id;
done

start from scratch for all 100 plates (only part of 1 plate done so far)

cd batch2
for id in 6157 6158 6159 4909 6160 6161 4911 6162 6163 6164 6165 6166 6167 4910 6168 6169 4912 6170 4913 6171 6172 4914 6173 6174 6175 6176 6177 6178 6179 4915 4917 4916 4951 4953 4952 4954 4956 4955 4958 4959 4957 6180 4962 4961 4960 4963 4964 4965 4966 4967 4968 4969 4970 4971 4973 4972 4974 4975 4976 4977 4978 6181 6182 6183 6184 6185 6186 4979 4980 6187 6188 6189 6190 6191 6192 6193 6194 4981 6195 6196 6197 6198 6199 6200 4982 6201 6202 6203 6204 6205 6206 6207 4983 4984 4986 4985 4987 4989 4988 4990; do
  echo $id;
  omero zarr export Plate:$id --name_by name;
done
will-moore commented 1 year ago

Remaining of the first batch of 10 plates exported OK. Renamed to plateName.ome.zarr using e.g. https://idr.openmicroscopy.org/webclient/?show=plate-6210 to lookup...

(base) [wmoore@ome-zarr-dev1 idr0016]$ ls -alh
total 4.0K
drwxr-xr-x 11 wmoore lsd  161 Jul 11 21:13 .
drwx------ 30 wmoore lsd 4.0K Jul 11 13:00 ..
drwxr-xr-x 18 wmoore lsd  180 Jul 10 22:14 24279.ome.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 21:00 4908.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 00:59 6151.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 06:23 6153.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 15:28 6154.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 18:14 6155.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 23:52 6156.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 03:47 6210.zarr
drwxr-xr-x  7 wmoore lsd  116 Jul 12 00:47 batch2
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv 4908.zarr 24297.ome.zarr
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv 6151.zarr 24280.ome.zarr
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv 6153.zarr 24294.ome.zarr
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv 6154.zarr 24295.ome.zarr
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv 6155.zarr 24296.ome.zarr
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv 6156.zarr 24300.ome.zarr
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv 6210.zarr 24293.ome.zarr

moved into batch1 dir to zip...

$ cd batch1/
(base) [wmoore@ome-zarr-dev1 batch1]$ ls -lh
total 0
drwxr-xr-x 18 wmoore lsd 180 Jul 10 22:14 24279.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 00:59 24280.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 03:47 24293.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 06:23 24294.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 15:28 24295.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 18:14 24296.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 21:00 24297.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 23:52 24300.ome.zarr
(base) [wmoore@ome-zarr-dev1 batch1]$ for i in */; do zip -mr "${i%/}.zip" "$i"; done
will-moore commented 1 year ago

Current status of batch2 export of 100 plates... Just under 3 hours per plate...

(base) [wmoore@ome-zarr-dev1 ~]$ ls -lh ~/idr0016/batch2
total 0
drwxr-xr-x 18 wmoore lsd 180 Jul 11 16:21 24301.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 19:01 24302.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 11 21:49 24303.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 12 00:37 24304.ome.zarr
drwxr-xr-x 18 wmoore lsd 180 Jul 12 03:13 24305.ome.zarr
drwxr-xr-x 12 wmoore lsd 126 Jul 12 04:49 24306.ome.zarr

Space is enough for 100 plates (approx 6.6 GB per plate)...

$ df -h ./
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-root  994G  139G  855G  14% /

Zipping progress of batch1 - about 25 minutes per plate...

(base) [wmoore@ome-zarr-dev1 ~]$ ls -lh ~/idr0016/batch1
total 33G
-rw-r--r--  1 wmoore lsd 6.1G Jul 12 03:28 24280.ome.zarr.zip
-rw-r--r--  1 wmoore lsd 6.1G Jul 12 03:53 24293.ome.zarr.zip
-rw-r--r--  1 wmoore lsd 6.1G Jul 12 04:17 24294.ome.zarr.zip
-rw-r--r--  1 wmoore lsd 6.0G Jul 12 04:42 24295.ome.zarr.zip
drwxr-xr-x 18 wmoore lsd  180 Jul 11 18:14 24296.ome.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 21:00 24297.ome.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 11 23:52 24300.ome.zarr
-rw-------  1 wmoore lsd 4.2G Jul 12 05:03 ziLKSu9D
will-moore commented 1 year ago

Upload remaining 7 zips of batch1. Timeout failure again...

(base) [wmoore@ome-zarr-dev1 bin]$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d ~/idr0016/batch1/idr0016 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/136e8d-xxxxxx

24280.ome.zarr.zip                                                                                                                   0%  562MB 95.1Mb/s   - error - 
                                                                                                                                Error 35 [Data transfer timeout]     
Partial Completion: 588018K bytes transferred in 120 seconds
 (40105K bits/sec), in 7 files, 1 directory; 7 files failed.

Session Stop  (Error: Session data transfer timeout)
will-moore commented 1 year ago

Move 7 zips to minio objectstore...

(base) [wmoore@ome-zarr-dev1 idr]$ mv ~/idr0016/batch1/idr0016/* /uod/idr/objectstore/minio/idr/idr0016/

These are then available to download from e.g. https://minio-dev.openmicroscopy.org/idr/idr0016/24280.ome.zarr.zip

will-moore commented 1 year ago

Want to use idr-ftp machine to aspera the data to BioStudies (as we did for idr0012)...

Try to rsync to ssh ome-zarr-dev1.openmicroscopy.org from there but can't ssh...

(base) [wmoore@idrftp-ftp ~]$ ssh ome-zarr-dev1.openmicroscopy.org
ssh: Could not resolve hostname ome-zarr-dev1.openmicroscopy.org: Name or service not known

Try to use the minio data available above...

Install goofys on idr-ftp to copy data there.

$ cd
$ sudo wget https://github.com/kahing/goofys/releases/latest/download/goofys
$ sudo chmod +x ./goofys 

$ sudo mkdir ./minio
$ sudo ~/goofys --endpoint https://minio-dev.openmicroscopy.org/ -o allow_other idr0012 ./minio
2023/07/12 12:15:51.233904 main.FATAL Unable to mount file system, see syslog for details
will-moore commented 1 year ago

Downloaded 7 zips to idr-ftp machine with....

$ wget https://minio-dev.openmicroscopy.org/idr/idr0016/24294.ome.zarr.zip
etc...
...
$ ls -lh
total 42G
-rw-rw-r--. 1 wmoore wmoore 6.1G Jul 12 02:28 24280.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 6.1G Jul 12 02:53 24293.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 6.1G Jul 12 03:17 24294.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 6.0G Jul 12 03:42 24295.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 6.0G Jul 12 04:10 24296.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 6.1G Jul 12 04:40 24297.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 6.0G Jul 12 05:29 24300.ome.zarr.zip

Upload to BioStudies...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0016/idr0016/ bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/xxxxxx
will-moore commented 1 year ago

Tried to install p7zip on ome-zarr-dev1 without success...

$ sudo yum install p7zip
Loaded plugins: langpacks, product-id, rhnplugin, search-disabled-repos, subscription-manager
The SSL certificate failed verification.
will-moore commented 1 year ago

@dominikl Plates are taking about 3 hours to export. We have exported about 27 / 413 Plates (from idr-testing.openmicroscopy.org).

(386 * 3) / 24 = 48 days This is too long, so we need to speed this up and run on multiple machines, exporting from multiple servers. E.g. idr-testing and idr.openmicroscopy.org.

The first 10 Plates (done) and 100 (now running) leave 303 Plates to follow (or export at the same time elsewhere).

These 303 IDs are:

 4991 4992 4993 4994 4995 4996 4997 4998 4999 5001 5000 5002 5004 5003 5005 5006 5007 5008 5010 5009 5011 5012 5014 5013 5015 5017 5016 5019 5018 5020 5021 5023 5022 5024 5025 5026 5029 5027 5028 5032 5031 5030 5033 5035 5034 5036 5037 5038 5039 5040 5041 5042 5044 5043 5047 5046 5045 5050 5048 5049 5052 5051 5053 5054 5056 5055 5059 5058 5057 5062 5060 5061 5063 5065 5064 5066 5068 5067 5069 5071 5070 5072 5074 5073 5075 5076 5077 5080 5079 5078 5081 5082 5083 5084 5085 5086 5087 5088 5089 5091 5090 5092 5094 5093 5095 5096 5097 5098 5101 5100 5099 5102 5103 5104 5105 5106 5107 5151 5152 5153 5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183 5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213 5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5251 5250 5252 5253 5254 5255 5256 5257 5259 5258 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5351 5303 5304 5305 5306 5307 5308 5380 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378

NB: when creating conda env for exporting, use pip install git+https://github.com/will-moore/omero-cli-zarr.git@name_option and run with omero zarr export Plate:$id --name_by name; as described above.

Current export is running at

$ ssh -A ome-zarr-dev1.openmicroscopy.org
(base) [wmoore@ome-zarr-dev1 ~]$ ls -lh /lifesci/groups/jrs/wmoore/idr0016/batch2
total 4.0K
drwxr-xr-x 18 wmoore lsd  180 Jul 13 04:47 24320.ome.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 13 07:21 24321.ome.zarr
drwxr-xr-x 18 wmoore lsd  180 Jul 13 09:50 24352.ome.zarr
drwxr-xr-x  3 wmoore lsd   45 Jul 13 10:00 24357.ome.zarr
drwxr-xr-x 10 wmoore lsd 4.0K Jul 13 09:43 idr0016

Getting data off that machine is hard as aspera times-out badly and I can't install p7zip (more reasons to run other batches elsewhere). But the exported data can sit there till I'm back. Should be enough space for over 100 Plates (6.6 GB each)

(base) [wmoore@ome-zarr-dev1 ~]$ df -h ./
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-root  994G  170G  824G  18% /
will-moore commented 1 year ago

Moving zips off ome-zarr-dev1...

(base) [wmoore@ome-zarr-dev1 ~]$ cd idr0016/batch2/idr0016/
(base) [wmoore@ome-zarr-dev1 idr0016]$ ls *.zip
24301.ome.zarr.zip  24302.ome.zarr.zip  24303.ome.zarr.zip  24304.ome.zarr.zip  24305.ome.zarr.zip  24306.ome.zarr.zip  24307.ome.zarr.zip
(base) [wmoore@ome-zarr-dev1 idr0016]$ mv *.zip /uod/idr/objectstore/minio/idr/idr0016/

Then on idr-ftp...

$ cd /data/ngff/idr0016/
$ for i in 24301.ome.zarr.zip 24302.ome.zarr.zip 24303.ome.zarr.zip 24304.ome.zarr.zip 24305.ome.zarr.zip 24306.ome.zarr.zip 24307.ome.zarr.zip; do wget "https://minio-dev.openmicroscopy.org/idr/idr0016/${i%}"; done;

From there, upload to BioStudies...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0016/idr0016/ bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/xxxxxxxx
will-moore commented 1 year ago

Repeated above steps with 7 more zips... 24308.ome.zarr.zip 24309.ome.zarr.zip 24310.ome.zarr.zip 24311.ome.zarr.zip 24312.ome.zarr.zip 24313.ome.zarr.zip 24319.ome.zarr.zip

dominikl commented 1 year ago

dominikl commented 1 year ago

dominikl commented 1 year ago

Export, zip and upload now running on pilot-idr0136 and pilot-idr0142. Ids which have been exported or are still in progress:

pilot-idr0136:

5303 5304 5305 5306 5307 5308 5380 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378
5214 5215 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 5233 5234 5235 5236 5237 5238 5239 5240 5241 5242 5243
5184 5185 5186 5187 5188 5189 5190 5191 5192 5193 5194 5195 5196 5197 5198 5199 5200 5201 5202 5203 5204 5205 5206 5207 5208 5209 5210 5211 5212 5213
5081 5082 5083 5084 5085 5086 5087 5088 5089 5091 5090 5092 5094 5093 5095 5096 5097 5098 5101 5100 5099 5102 5103 5104 5105 5106 5107 5151 5152 5153
4991 4992 4993 4994 4995 4996 4997 4998 4999 5001 5000 5002 5004 5003 5005 5006 5007 5008 5010 5009 5011 5012 5014 5013 5015 5017 5016 5019 5018 5020 5021

pilot-idr0142:

5274 5275 5276 5277 5278 5279 5280 5281 5282 5283 5284 5285 5286 5287 5288 5289 5290 5291 5292 5293 5294 5295 5296 5297 5298 5299 5300 5301 5302 5351
5244 5245 5246 5247 5248 5249 5251 5250 5252 5253 5254 5255 5256 5257 5259 5258 5260 5261 5262 5263 5264 5265 5266 5267 5268 5269 5270 5271 5272 5273 
5154 5155 5156 5157 5158 5159 5160 5161 5162 5163 5164 5165 5166 5167 5168 5169 5170 5171 5172 5173 5174 5175 5176 5177 5178 5179 5180 5181 5182 5183
5052 5051 5053 5054 5056 5055 5059 5058 5057 5062 5060 5061 5063 5065 5064 5066 5068 5067 5069 5071 5070 5072 5074 5073 5075 5076 5077 5080 5079 5078
5023 5022 5024 5025 5026 5029 5027 5028 5032 5031 5030 5033 5035 5034 5036 5037 5038 5039 5040 5041 5042 5044 5043 5047 5046 5045 5050 5048 5049 

Ids left to do:

---

Note: Running in conda env:

conda create -n "myenv" python=3.9.12 ipython
conda activate myenv
conda install -c ome omero-py
pip install git+https://github.com/will-moore/omero-cli-zarr.git@name_option
dominikl commented 1 year ago

I converted all the remaining Ids from https://github.com/IDR/idr-metadata/issues/638#issuecomment-1633863222 and uploaded to biostudies. But I'm not sure if there are still other zips somewhere which have not been uploaded yet. Need @will-moore to check again. I also forgot to check if there was already a idr0016_files.tsv, so I might have overwritten it with my idr0016_files.tsv (which only contains the zips for the IDs mentioned above).

will-moore commented 1 year ago

Move zips...

ssh -A ome-zarr-dev1.openmicroscopy.org
cd idr0016/batch2/idr0016/
$ ls *.zip
24320.ome.zarr.zip  24321.ome.zarr.zip  24352.ome.zarr.zip  24357.ome.zarr.zip  24507.ome.zarr.zip  24508.ome.zarr.zip  24509.ome.zarr.zip  24512.ome.zarr.zip
mv *.zip /uod/idr/objectstore/minio/idr/idr0016/
ssh idr-ftp.openmicroscopy.org
cd /data/ngff/idr0016/idr0016
screen -S idr0016_wget
for i in 24320.ome.zarr.zip  24321.ome.zarr.zip  24352.ome.zarr.zip  24357.ome.zarr.zip  24507.ome.zarr.zip  24508.ome.zarr.zip  24509.ome.zarr.zip  24512.ome.zarr.zip; do wget "https://minio-dev.openmicroscopy.org/idr/idr0016/${i%}"; done;

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0016/idr0016/ bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/xxxxxx
will-moore commented 1 year ago

Zipping up 79 zarrs exported above...

on ome-zarr-dev1

$ screen -r idr0016_zip
$ cd idr0016/batch2/
$ ls
24514.ome.zarr  24562.ome.zarr  24586.ome.zarr  24596.ome.zarr  24619.ome.zarr  24636.ome.zarr  24644.ome.zarr  24654.ome.zarr  24667.ome.zarr  24732.ome.zarr
24515.ome.zarr  24563.ome.zarr  24588.ome.zarr  24602.ome.zarr  24623.ome.zarr  24637.ome.zarr  24645.ome.zarr  24655.ome.zarr  24683.ome.zarr  24733.ome.zarr
24516.ome.zarr  24564.ome.zarr  24590.ome.zarr  24604.ome.zarr  24624.ome.zarr  24638.ome.zarr  24646.ome.zarr  24656.ome.zarr  24684.ome.zarr  24734.ome.zarr
24517.ome.zarr  24565.ome.zarr  24591.ome.zarr  24605.ome.zarr  24625.ome.zarr  24639.ome.zarr  24647.ome.zarr  24657.ome.zarr  24685.ome.zarr  24735.ome.zarr
24518.ome.zarr  24566.ome.zarr  24592.ome.zarr  24609.ome.zarr  24631.ome.zarr  24640.ome.zarr  24648.ome.zarr  24661.ome.zarr  24687.ome.zarr  24736.ome.zarr
24523.ome.zarr  24583.ome.zarr  24593.ome.zarr  24611.ome.zarr  24633.ome.zarr  24641.ome.zarr  24651.ome.zarr  24663.ome.zarr  24688.ome.zarr  24739.ome.zarr
24525.ome.zarr  24584.ome.zarr  24594.ome.zarr  24617.ome.zarr  24634.ome.zarr  24642.ome.zarr  24652.ome.zarr  24664.ome.zarr  24726.ome.zarr
24560.ome.zarr  24585.ome.zarr  24595.ome.zarr  24618.ome.zarr  24635.ome.zarr  24643.ome.zarr  24653.ome.zarr  24666.ome.zarr  24731.ome.zarr

$ for i in */; do zip -mr "${i%/}.zip" "$i"; done
will-moore commented 1 year ago
$ screen -r idr0016_export
$ cd idr0016/batch2/
$ ls *.zip | wc
     78      78    1482
$ mv *.zip /uod/idr/objectstore/minio/idr/idr0016/

on idr-ftp

for i in 24514.ome.zarr.zip 24564.ome.zarr.zip 24592.ome.zarr.zip 24617.ome.zarr.zip 24636.ome.zarr.zip 24646.ome.zarr.zip 24661.ome.zarr.zip 24726.ome.zarr.zip 24515.ome.zarr.zip 24565.ome.zarr.zip 24593.ome.zarr.zip 24618.ome.zarr.zip 24637.ome.zarr.zip 24647.ome.zarr.zip 24663.ome.zarr.zip 24731.ome.zarr.zip 24516.ome.zarr.zip 24566.ome.zarr.zip 24594.ome.zarr.zip 24619.ome.zarr.zip 24638.ome.zarr.zip 24648.ome.zarr.zip 24664.ome.zarr.zip 24732.ome.zarr.zip 24517.ome.zarr.zip 24583.ome.zarr.zip 24595.ome.zarr.zip 24623.ome.zarr.zip 24639.ome.zarr.zip 24651.ome.zarr.zip 24666.ome.zarr.zip 24733.ome.zarr.zip 24518.ome.zarr.zip 24584.ome.zarr.zip 24596.ome.zarr.zip 24624.ome.zarr.zip 24640.ome.zarr.zip 24652.ome.zarr.zip 24667.ome.zarr.zip 24734.ome.zarr.zip 24523.ome.zarr.zip 24585.ome.zarr.zip 24602.ome.zarr.zip 24625.ome.zarr.zip 24641.ome.zarr.zip 24653.ome.zarr.zip 24683.ome.zarr.zip 24735.ome.zarr.zip 24525.ome.zarr.zip 24586.ome.zarr.zip 24604.ome.zarr.zip 24631.ome.zarr.zip 24642.ome.zarr.zip 24654.ome.zarr.zip 24684.ome.zarr.zip 24736.ome.zarr.zip 24560.ome.zarr.zip 24588.ome.zarr.zip 24605.ome.zarr.zip 24633.ome.zarr.zip 24643.ome.zarr.zip 24655.ome.zarr.zip 24685.ome.zarr.zip 24739.ome.zarr.zip 24562.ome.zarr.zip 24590.ome.zarr.zip 24609.ome.zarr.zip 24634.ome.zarr.zip 24644.ome.zarr.zip 24656.ome.zarr.zip 24687.ome.zarr.zip 24563.ome.zarr.zip 24591.ome.zarr.zip 24611.ome.zarr.zip 24635.ome.zarr.zip 24645.ome.zarr.zip 24657.ome.zarr.zip 24688.ome.zarr.zip; do wget "https://minio-dev.openmicroscopy.org/idr/idr0016/${i%}"; done;
will-moore commented 1 year ago

Uploading 78 zips...

$ sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0016/idr0016/ bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/xxxxxxxx
will-moore commented 1 year ago

All zips have uploaded to BioStudies now, so we have all 413 zarr.zips there. Just sorting by size, I see that most are about 5GB but some are virtually empty.

Image

Need to re-export the all smallest 344 bytes zarrs. It turns out that these all have empty A1 well, so the export failed with bug that is fixed at https://github.com/ome/omero-cli-zarr/pull/147/commits/1d726264f44e2b6cb833bcc23603e2b7e56121b5

Other smaller zips are due to plates having a low number of Wells (don't need to re-export)

Manually looking up IDs for the empty plates... 5259, 5258, 5260, 5261

Export on idr-ftp...

conda activate omero_zarr_export
pip freeze | grep zarr
ome-zarr==0.8.0
omero-cli-zarr @ git+https://github.com/will-moore/omero-cli-zarr.git@e882a620d575bffdca21144a41bb990ab2039d8e
zarr==2.15.0

pip install -U git+https://github.com/will-moore/omero-cli-zarr.git@name_option
Successfully installed omero-cli-zarr-0.1.dev456+gc73d400

omero login
for id in 5259 5258 5260 5261; do
  omero zarr export Plate:$id --name_by name;
done
will-moore commented 1 year ago

Zip and upload...

$ sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0016/re_export/idr0016/ bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/xxxxxx
26564.ome.zarr.zip                                           100% 1022MB  480Mb/s    00:20    
26569.ome.zarr.zip                                            100%  219MB  432Mb/s    00:24    
26572.ome.zarr.zip                                            100%  795MB  456Mb/s    00:39    
26574.ome.zarr.zip                                             100% 1282MB  371Mb/s    01:04 

These are still quite small compared with other plates, but probably just due to not many Wells being filled.

will-moore commented 1 year ago

At https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD851.html we how have 122 out of 413 filesets "viewable".

Lets just take first 10 for testing...

idr0016/24516.ome.zarr,S-BIAD851/05334862-30d8-4a98-899f-2738a0dfc94d,23576
idr0016/25918.ome.zarr,S-BIAD851/0e4290c9-52ba-418c-ae97-86e5e7a43439,21482
idr0016/26592.ome.zarr,S-BIAD851/0e46a0b5-6257-425d-bb91-1f953ae6c913,21569
idr0016/24638.ome.zarr,S-BIAD851/0ed303e9-ecd5-4945-8e92-59b392e51554,23585
idr0016/24595.ome.zarr,S-BIAD851/0f02c2f2-2ca7-424f-8186-2cbd88903cbb,21263
idr0016/26672.ome.zarr,S-BIAD851/1001629e-4727-4e8f-b741-dd825fb1dd63,21596
idr0016/25569.ome.zarr,S-BIAD851/104f679f-a14a-42f6-97d6-bf9507de606b,21352
idr0016/24617.ome.zarr,S-BIAD851/1110cfdc-f807-4464-8342-6716cad0fd07,21270
idr0016/25707.ome.zarr,S-BIAD851/11d072c0-112c-4fb2-9170-6009ca3f7bbc,21402
idr0016/25576.ome.zarr,S-BIAD851/11f72eb1-ab8c-4765-8cf4-660556471ac5,21359
Found prefix demo_2/2017-08/16 // 02-26-49.136 for fileset 23576
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136
Creating dir at /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr -> /bia-integrator-data/S-BIAD851/05334862-30d8-4a98-899f-2738a0dfc94d/05334862-30d8-4a98-899f-2738a0dfc94d.zarr
...
will-moore commented 1 year ago

Was taking a long time to mkngff so stopped after 1st 2 complete... Taking over 2 hours per Fileset.

Ran just those 2...

$  psql -U omero -d idr -h $DBHOST -f 23576.sql 
BEGIN
 mkngff_fileset 
----------------
        5287517
(1 row)
COMMIT
(mkngff) bash-4.2$  psql -U omero -d idr -h $DBHOST -f 21482.sql 
BEGIN
 mkngff_fileset 
----------------
        5287518
(1 row)

Manual psql since we didn't have https://github.com/IDR/omero-mkngff/pull/8

idr=> UPDATE pixels SET name = '.zattrs', path = 'demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr' where image in (select id from Image where fileset = 5287517);
UPDATE 2304

idr=> UPDATE pixels SET name = '.zattrs', path = 'demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr' where image in (select id from Image where fileset = 5287518);
UPDATE 2304

Checking first of those plates "24516" at http://localhost:1080/webclient/?show=image-3333350 Memo regenerating...

will-moore commented 1 year ago

Memo file generation for that Plate doesn't seem to have completed. http://localhost:1040/webclient/?show=image-3333350 still not displaying...

Checking logs for that fileset: - don't see anything before just now... NB - Current build of OMEZarrReader is logging everything at ERROR, even if they aren't errors.

(base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ grep "mkngff/05334862-30d8" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2023-09-19 15:20:13,493 INFO  [      ome.services.OmeroFilePathResolver] (l.Server-3) Metadata only file, resulting path: /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/.zattrs
2023-09-19 15:20:15,736 INFO  [                loci.formats.ImageReader] (l.Server-3) ZarrReader initializing /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/.zattrs
2023-09-19 15:20:16,621 ERROR [              loci.formats.FormatHandler] (l.Server-3) ZarrReader attempting to initialize file: /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/.zattrs
2023-09-19 15:22:00,667 INFO  [      ome.services.OmeroFilePathResolver] (l.Server-4) Metadata only file, resulting path: /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/.zattrs
2023-09-19 15:22:00,677 INFO  [                loci.formats.ImageReader] (l.Server-4) ZarrReader initializing /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/.zattrs
2023-09-19 15:22:01,568 ERROR [              loci.formats.FormatHandler] (l.Server-4) ZarrReader attempting to initialize file: /data/OMERO/ManagedRepository/demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/.zattrs
2023-09-19 15:31:35,110 INFO  [        ome.services.util.ServiceHandler] (l.Server-9)  Rslt:    ([demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/P/9/5/3/, .zarray, unknown], [demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/P/9/5/, 3, unknown], [demo_2/2017-08/16/02-26-49.136_mkngff/05334862-30d8-4a98-899f-2738a0dfc94d.zarr/P/9/5/2/, .zarray, unknown], ... 26527 more)

Tried an Image from the 2nd Plate above to trigger memo file for that Plate: http://localhost:1040/webclient/?show=image-2500103 and checking for logs on that Fileset:

(base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ grep "mkngff/0e4290c9-52ba" /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2023-09-19 15:24:06,586 INFO  [      ome.services.OmeroFilePathResolver] (l.Server-0) Metadata only file, resulting path: /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs
2023-09-19 15:24:07,774 INFO  [      ome.services.OmeroFilePathResolver] (l.Server-2) Metadata only file, resulting path: /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs
2023-09-19 15:24:07,815 INFO  [                loci.formats.ImageReader] (l.Server-0) ZarrReader initializing /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs
2023-09-19 15:24:07,815 INFO  [                loci.formats.ImageReader] (l.Server-2) ZarrReader initializing /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs
2023-09-19 15:24:09,140 ERROR [              loci.formats.FormatHandler] (l.Server-0) ZarrReader attempting to initialize file: /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs
2023-09-19 15:24:09,704 ERROR [              loci.formats.FormatHandler] (l.Server-2) ZarrReader attempting to initialize file: /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs
2023-09-19 15:25:15,162 INFO  [        ome.services.util.ServiceHandler] (l.Server-8)  Rslt:    ([demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/P/9/5/3/, .zarray, unknown], [demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/P/9/5/, 3, unknown], [demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/P/9/5/2/, .zarray, unknown], ... 26527 more)
will-moore commented 1 year ago

Memo file generation completed for 2nd plate above, viewing http://localhost:1040/webclient/?show=image-2500103. The ZarrReader used was yesteday's manual ERROR logging build (also updated on idr0125-pilot) https://github.com/ome/ZarrReader/pull/64#issuecomment-1725456254

(base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ grep -A 2 "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log | grep -A 2 "mkngff/0e4290c9-52ba"
2023-09-19 20:29:14,972 DEBUG [                   loci.formats.Memoizer] (l.Server-2) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/..zattrs.bfmemo (3898753 bytes)
2023-09-19 20:29:14,972 DEBUG [                   loci.formats.Memoizer] (l.Server-2) start[1695137047777] time[18307195] tag[loci.formats.Memoizer.setId]
2023-09-19 20:29:14,972 INFO  [                ome.io.nio.PixelsService] (l.Server-2) Creating BfPixelBuffer: /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs Series: 0
--
2023-09-19 20:29:15,056 DEBUG [                   loci.formats.Memoizer] (l.Server-0) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/..zattrs.bfmemo (3898753 bytes)
2023-09-19 20:29:15,056 DEBUG [                   loci.formats.Memoizer] (l.Server-0) start[1695137046589] time[18308467] tag[loci.formats.Memoizer.setId]
2023-09-19 20:29:15,056 INFO  [                ome.io.nio.PixelsService] (l.Server-0) Creating BfPixelBuffer: /data/OMERO/ManagedRepository/demo_2/2016-06/25/17-09-16.470_mkngff/0e4290c9-52ba-418c-ae97-86e5e7a43439.zarr/.zattrs Series: 0

18307195 ms is 5 hours

will-moore commented 1 year ago

See idr0016.csv commit

Started mkngff sql && psql commit at 11:30 last night.. after ~10 hours done about 80 filesets (7.5 mins each) - Will take 51 hours to do all 413 filesets.

will-moore commented 1 year ago

Server restart to remount goofys...

Restart omero mkngff sql generation (now using https://github.com/IDR/omero-mkngff/pull/11/commits/a2d0aeeb5195e7374c7cb48e5d989d813a05f982 to skip sql if already done) but this time, don't execute the sql (we don't want to re-run sql that's been run before).

Use same $SECRET as in existing sql, so they all have the same...

export SECRET=602d53b5-6120-4a07-8013-a81c16a5ee81
for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3)
  omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" >> "$IDRID/$fsid.sql"
done

Fileset 21281 first to be processed in this round. 96 Filesets done so far out of 413.

will-moore commented 1 year ago

Restarted server to re-mount goofys again... Starting on Fileset 23584...

sbesson commented 1 year ago

It is becoming increasingly clear that the goofys file system is struggling with the requirements of the current fileset swap operations.

will-moore commented 1 year ago

@sbesson

will-moore commented 1 year ago

Cancelled mkngff sql on idr-testing just now as I realised there's a bug that omits .zarray files. Stopped after Fileset 21510 (233 / 413) in the idr0016.csv

Fixed in https://github.com/IDR/omero-mkngff/pull/11/commits/cac303d3c1bdab030ee286533b94fa744461d726

Updated...

(venv3) [root@test120-omeroreadwrite wmoore]# pip install git+https://github.com/will-moore/omero-mkngff.git@dont_walk_arrays
...
  Resolved https://github.com/will-moore/omero-mkngff.git to commit 08db883c54410265783d5f5a4cf5f6b31d2dd5e3
will-moore commented 1 year ago

Start from scratch on idr0138-pilot as regular wmoore user...

wget https://raw.githubusercontent.com/IDR/idr-utils/ebbb0b9dc6ed548db9bbe298c062a14885411097/scripts/ngff_filesets/idr0016.csv

(venv3) (base) [wmoore@pilot-idr0138-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do
>   biapath=$(echo $r | cut -d',' -f2)
>   uuid=$(echo $biapath | cut -d'/' -f2)
>   fsid=$(echo $r | cut -d',' -f3)
>   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" >> "$IDRID/$fsid.sql"
> done
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-06/24/05-33-04.817 for fileset: 21405
...
will-moore commented 1 year ago

The last successful sql generated above was number 72/413:

idr0016/26595.ome.zarr,S-BIAD851/2632c5cd-86ec-434a-9da7-5277ab002250,21570

It seems that at that point this failed, and is current status

$ ls /bia-integrator-data 
ls: cannot access /bia-integrator-data: Transport endpoint is not connected
will-moore commented 1 year ago

Re-mounted goofys /bia-integrator-data and restarted server... Edited idr0016.csv to remove all lines before idr0016/26595.ome.zarr,S-BIAD851/2632c5cd-86ec-434a-9da7-5277ab002250,21570 (maybe should have removed that line too?)... And re-ran omero mkngff sql as above...

will-moore commented 1 year ago

restart, after 116 filesets processed since last restart...

(venv3) (base) [wmoore@pilot-idr0138-omeroreadwrite ~]$ for r in $(cat $IDRID.csv); do   biapath=$(echo $r | cut -d',' -f2);   uuid=$(echo $biapath | cut -d'/' -f2);   fsid=$(echo $r | cut -d',' -f3);   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"; done
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-06/25/06-09-19.476 for fileset: 21471
...
will-moore commented 1 year ago

All done... A couple of files are 0 bytes:

idr0016/21453.sql
idr0016/21256.sql
will-moore commented 11 months ago

Need to re-convert Plate named 24667 since previous NGFF conversion is missing some files from N10 field 1:

https://ome.github.io/ome-ngff-validator/?source=https%3A%2F%2Fuk1s3.embassy.ebi.ac.uk%2Fbia-integrator-data%2FS-BIAD851%2F2c49b893-ec6d-4329-9cc3-569b820075f2%2F2c49b893-ec6d-4329-9cc3-569b820075f2.zarr&well=all and https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD851/2c49b893-ec6d-4329-9cc3-569b820075f2/2c49b893-ec6d-4329-9cc3-569b820075f2.zarr/N/10/

On zarr1-dev-pilot...

conda activate bioformats2raw2
~/bioformats2raw-0.6.0-24/bin/bioformats2raw --memo-directory ../memo  /uod/idr/metadata/idr0016-wawer-bioactivecompoundprofiling/screens/24667.screen 24667.ome.zarr

EDIT - this failed! Forgot that we're using omero-cli-zarr for idr0016 exports...

conda activate omero-zarr-export
pip install -U git+https://github.com/will-moore/omero-cli-zarr.git@name_option

$ omero zarr export Plate:6202 --name_by name

Error loading: /home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/plugins/zarr.py
Traceback (most recent call last):
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/cli.py", line 1690, in loadpath
    execfile(str(pathobj), loc)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/past/builtins/misc.py", line 87, in execfile
    exec_(code, myglobals, mylocals)
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero/plugins/zarr.py", line 1, in <module>
    from omero_zarr.cli import HELP, ZarrControl
  File "/home/wmoore/miniconda3/envs/omero_zarr_export/lib/python3.9/site-packages/omero_zarr/__init__.py", line 21, in <module>
    from ._version import version as __version__
ModuleNotFoundError: No module named 'omero_zarr._version'
usage: /home/wmoore/miniconda3/envs/omero_zarr_export/bin/omero
       [-h] [-v] [-d DEBUG] [--path PATH] [-C] [-s SERVER] [-p PORT]
       [-g GROUP] [-u USER] [-w PASSWORD] [-k KEY] [--sudo ADMINUSER] [-q]
       <subcommand> ...
/home/wmoore/miniconda3/envs/omero_zarr_export/bin/omero: error: argument <subcommand>: invalid choice: 'zarr'
will-moore commented 11 months ago

Use idr-ftp as above:

$ conda activate omero_zarr_export
(omero_zarr_export) [wmoore@idrftp-ftp idr0016]$ pip freeze | grep zarr
ome-zarr==0.8.0
omero-cli-zarr @ git+https://github.com/will-moore/omero-cli-zarr.git@c73d40046536f8b5cc62908ebdaa86d097a30d0b
zarr==2.16.1

omero zarr export Plate:6202 --name_by name

Check that plate isn't missing Well N/10/1 as above...

(omero_zarr_export) [wmoore@idrftp-ftp idr0016]$ ls -alh 24667.ome.zarr/N/10/
total 12K
drwxrwxr-x.  8 wmoore wmoore  126 Nov 15 16:25 .
drwxrwxr-x. 26 wmoore wmoore 4.0K Nov 15 16:27 ..
drwxrwxr-x.  6 wmoore wmoore  100 Nov 15 16:25 0
drwxrwxr-x.  6 wmoore wmoore  100 Nov 15 16:25 1
drwxrwxr-x.  6 wmoore wmoore  100 Nov 15 16:25 2
drwxrwxr-x.  6 wmoore wmoore  100 Nov 15 16:25 3
drwxrwxr-x.  6 wmoore wmoore  100 Nov 15 16:25 4
drwxrwxr-x.  6 wmoore wmoore  100 Nov 15 16:25 5
-rw-rw-r--.  1 wmoore wmoore  420 Nov 15 16:25 .zattrs
-rw-rw-r--.  1 wmoore wmoore   24 Nov 15 16:25 .zgroup
(omero_zarr_export) [wmoore@idrftp-ftp idr0016]$ ls -alh 24667.ome.zarr/N/10/1
total 12K
drwxrwxr-x. 6 wmoore wmoore  100 Nov 15 16:25 .
drwxrwxr-x. 8 wmoore wmoore  126 Nov 15 16:25 ..
drwxrwxr-x. 7 wmoore wmoore   94 Nov 15 16:25 0
drwxrwxr-x. 7 wmoore wmoore   94 Nov 15 16:25 1
drwxrwxr-x. 7 wmoore wmoore   94 Nov 15 16:25 2
drwxrwxr-x. 7 wmoore wmoore   94 Nov 15 16:25 3
-rw-rw-r--. 1 wmoore wmoore 4.5K Nov 15 16:25 .zattrs
-rw-rw-r--. 1 wmoore wmoore   24 Nov 15 16:25 .zgroup
$ zip -r 24667.ome.zarr.zip 24667.ome.zarr

Delete 24667.ome.zarr.zip from https://www.ebi.ac.uk/biostudies/submissions/files?path=%2Fuser%2Fidr0016 and reupload...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/idr0016/idr0016/ bsaspera_w@hx-fasp-1.ebi.ac.uk:/5f/13xxxxxxx

24667.ome.zarr.zip              100% 5461MB  454Mb/s    01:34    
Completed: 5592294K bytes transferred in 95 seconds
 (480827K bits/sec), in 1 file, 1 directory.
will-moore commented 11 months ago

idr0016 plates (Names) that are not yet viewable in idr-testing:

Lets run sql etc on clean idr0125-pilot data...

Update SECRET in sql... as wmoore

$ cd idr-util/scripts/ngff_filesets/idr0016
$ for i in $(ls); do sudo sed -i 's/SECRETUUID/c6b02bb7-2c22-4c45-be8d-30484c380a9c/g' $i; done

as omero-server user...

$ cd ngff_filesets/
$ export IDRID=idr0016

(venv3) (base) bash-4.2$ for r in $(cat $IDRID.csv); do
>   biapath=$(echo $r | cut -d',' -f2)
>   uuid=$(echo $biapath | cut -d'/' -f2)
>   fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
>   psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
>   omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr" --bfoptions
> done
UPDATE 2304
BEGIN
 mkngff_fileset 
----------------
        5288754
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/24/05-33-04.817
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/24/05-33-04.817_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/24/05-33-04.817_mkngff/000f81bf-a7b2-4610-99c3-47dc5fec8c92.zarr -> /bia-integrator-data/S-BIAD851/000f81bf-a7b2-4610-99c3-47dc5fec8c92/000f81bf-a7b2-4610-99c3-47dc5fec8c92.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/24/05-33-04.817
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/24/05-33-04.817_mkngff/000f81bf-a7b2-4610-99c3-47dc5fec8c92.zarr.bfoptions
UPDATE 2304
BEGIN
 mkngff_fileset 
----------------
        5288755
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2017-08/17/02-13-40.469
Creating dir at /data/OMERO/ManagedRepository/demo_2/2017-08/17/02-13-40.469_mkngff
...

Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/19/23-32-50.888
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/19/23-32-50.888_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/19/23-32-50.888_mkngff/fd822d4b-3060-46e9-8178-982510009c93.zarr -> /bia-integrator-data/S-BIAD851/fd822d4b-3060-46e9-8178-982510009c93/fd822d4b-3060-46e9-8178-982510009c93.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/19/23-32-50.888
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/19/23-32-50.888_mkngff/fd822d4b-3060-46e9-8178-982510009c93.zarr.bfoptions
UPDATE 2304
BEGIN
 mkngff_fileset
----------------
        5289164
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/27/02-33-37.895
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/27/02-33-37.895_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/27/02-33-37.895_mkngff/fdf51c56-0ecf-4e1c-8b47-c35cafd78a2c.zarr -> /bia-integrator-data/S-BIAD851/fdf51c56-0ecf-4e1c-8b47-c35cafd78a2c/fdf51c56-0ecf-4e1c-8b47-c35cafd78a2c.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/27/02-33-37.895
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/27/02-33-37.895_mkngff/fdf51c56-0ecf-4e1c-8b47-c35cafd78a2c.zarr.bfoptions
UPDATE 2304
BEGIN
 mkngff_fileset
----------------
        5289165
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/20/22-41-20.985
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/20/22-41-20.985_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/20/22-41-20.985_mkngff/feea9b2d-dd05-428a-a04e-5ebd45048401.zarr -> /bia-integrator-data/S-BIAD851/feea9b2d-dd05-428a-a04e-5ebd45048401/feea9b2d-dd05-428a-a04e-5ebd45048401.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/20/22-41-20.985
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/20/22-41-20.985_mkngff/feea9b2d-dd05-428a-a04e-5ebd45048401.zarr.bfoptions
UPDATE 2304
BEGIN
 mkngff_fileset
----------------
        5289166
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/27/12-57-28.592
Creating dir at /data/OMERO/ManagedRepository/demo_2/2016-06/27/12-57-28.592_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2016-06/27/12-57-28.592_mkngff/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr -> /bia-integrator-data/S-BIAD851/ff85e5f2-258a-46ad-bdd0-d4f296aec28e/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2016-06/27/12-57-28.592
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/2016-06/27/12-57-28.592_mkngff/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr.bfoptions
will-moore commented 11 months ago

Viewing images (first 2 viewed, waiting...) http://localhost:1040/webclient/?show=image-2330212 http://localhost:1040/webclient/?show=image-2340843 This one got an error http://localhost:1040/webclient/?show=image-2376573

    serverExceptionClass = ome.conditions.ResourceError
    message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/21/01-46-55.560_mkngff/5aec8bec-8573-44ec-9e9e-24fb81623fbe.zarr/B/1/.zattrs

Done but not tried viewing yet http://localhost:1040/webclient/?show=image-2486279 http://localhost:1040/webclient/?show=image-2131185

not done yet http://localhost:1040/webclient/?show=image-2435591

will-moore commented 11 months ago

Looking at the last Fileset generated above 5289166, Find Image ID via psql... Fileset doesn't have clientpath set:

last row of idr0016.csv:

idr0016/26110.ome.zarr,S-BIAD851/ff85e5f2-258a-46ad-bdd0-d4f296aec28e,21526

looking at 21526.sql...

UPDATE pixels SET name = '.zattrs', path = 'demo_2/2016-06/27/12-57-28.592_mkngff/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr' where image in (select id from Image where fileset = 21526);

begin;
    select mkngff_fileset(
      21526,
      'c6b02bb7-2c22-4c45-be8d-30484c380a9c',
      'cdf35825-def1-4580-8d0b-9c349b8f78d6',
      'demo_2/2016-06/27/12-57-28.592_mkngff/',
      array[
          ['demo_2/2016-06/27/12-57-28.592_mkngff/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr/', '.zattrs', 'application/octet-stream', 'https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD851/ff85e5f2-258a-46ad-bdd0-d4f296aec28e/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr/.zattrs'],
          ['demo_2/2016-06/27/12-57-28.592_mkngff/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr/', '.zgroup', 'application/octet-stream', 'https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD851/ff85e5f2-258a-46ad-bdd0-d4f296aec28e/ff85e5f2-258a-46ad-bdd0-d4f296aec28e.zarr/.zgroup'],
...

Ah!!! - I forgot to update and run setup.sql which creates the mkngff_fileset() sql function!