IDR / idr.openmicroscopy.org

Source for the IDR static website.
https://idr.openmicroscopy.org/about
Creative Commons Attribution 4.0 International
4 stars 16 forks source link

Add stats for prod119 release #187

Closed dominikl closed 9 months ago

dominikl commented 9 months ago

See title. Also added the cell studies type for idr0139 and idr0149, for some reason that was missed last time.

dominikl commented 9 months ago

Something's a little bit strange. I ran the stats for idr0143 with the --disable-fsusage flag. But still this number of 29Tb showed up for disk space usage, but the number of files were missing. And when I run du on the filesystem I get 19Tb. That's quite a difference. Any idea @sbesson ?

Trying again:

(venv3) [dlindner@prod119-omeroreadwrite metadata]$ python idr-utils/scripts/stats.py --release prod119 --disable-fsusage idr0143-herbst-coculture
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
idr0143-herbst-coculture    screenA prod119 3452    225 86400           112 259200  3208320 29.937475584    29937475584000          2160 x 2160 x 5 x 2 x 1 cell

And with du: MIP plates

(venv3) [dlindner@prod119-omeroreadwrite 20221222-ftp]$ du --max-depth=1
...
2687536940

Raw plates

(venv3) [dlindner@prod119-omeroreadwrite 20220822-ftp]$ du --max-depth=1
...
15864049064
sbesson commented 9 months ago

Pretty sure if you disable --fs-usage, the script attempt to guess the volume of data based on the pixel dimensions & type i.e. the number of bytes. This obviously ignore several considerations including the presence of pyramidal levels, the underlying file format & the usage of compression.

dominikl commented 9 months ago

So it would be better to use the size from OS du right?

sbesson commented 9 months ago

So it would be better to use the size from OS du right?

Yes, there are different interpretations of what "size" is expected to capture but I think the file size captured by du is one that is easy to interpret. For people interested in downloading the raw data, this also an informative metrics.