Closed valeriupredoi closed 1 year ago
Attention: 6 lines
in your changes are missing coverage. Please review.
Files | Coverage Δ | |
---|---|---|
activestorage/active.py | 92.53% <100.00%> (ø) |
|
activestorage/netcdf_to_zarr.py | 80.43% <70.00%> (-6.53%) |
:arrow_down: |
:loudspeaker: Thoughts on this report? Let us know!.
@markgoddard no more need to test with your test in #140 - I have implemented here actual pytest monitoring that measures memory used by individual tests - look at "Analyze S3 performance" in the GA test - it is clear that the S3 with AS is using almost exactly as much memory as the no AS case - this means only one thing - data loaded in the client's memory :confounded: :confounded: There is definite data transfer, most probably in kerchunk - I need to look much closer at this, in this PR
OK we got out culprit for sure - smoking gun held by SingleHdf5ToZarr
:
test_Active 881.75390625
test_no_Active 877.3359375
test_s3_SingleHdf5ToZarr 928.3359375
test_local_SingleHdf5ToZarr 890.296875
List of 100 slowest tests (duration, path, name
2.89s test_s3_performance.py::test_Active
2.40s test_s3_performance.py::test_s3_SingleHdf5ToZarr
0.18s test_s3_performance.py::test_local_SingleHdf5ToZarr
0.15s test_s3_performance.py::test_no_Active
OK gents @markgoddard @bnlawrence @davidhassell I now have a full picture of where and how the resources/data transfer etc are happening in our beloved PyActiveStorage (well, in Active
) - here's the memory/time chain for the case of a single netCDF4 file of 50M size, all serial, and all on a GH Actions machine (which is OK since I am the sole user of it when running the test, but numbers ma
y vary from run to run by a certain small percentage); numbers are for Active._version =
(the one that uses Kerchunk, and should be the closest to active storage proper:
The stack is: SingleHdf5ToZarr
-> JSON dump (h5chunks.translate()).encode()
) -> load Zarr Group -> have a Zarr reference file (image) -> select chunk -> send back chunk coords to AS -> get result from AS
Active
(end-to-end): 617M / 167sSingleHDF5ToZarr
: 66M / 0.23sActive
(end-to-end): 881M / 7.9sSingleHDF5ToZarr
: 67M / 0.07sSo it's pretty obvious there is data transfer at least once: at the Kerchunk chunks to JSON dump step; somewhere else along the stack after the Zarr-reference file has been loaded data is mishandled in memory leading to unnecessary memory clogging, but at least for the S3 case, it seems that the only actual data transfer to Client is happening at the JSON dump step, given that that takes the bulk of the time Active
runs for. I need to figure out if/how we can get rid of the JSON dumping (ie make h5chunks.translate()).encode()
not download the data), otherwise we're fairly hosed :+1:
Just so I understand this. The netCDF file is compressed right? What's the grid of the data in it? What is the chunking? (I am just trying to work out why we have >300M of json from 50M of data? The json should be the kerchunk chunk index right? Is the netCDF file itself sensible chunked?)
(There is also that configuration option which if set wrong, means they load all the data into the json file as an "optimisation")
@bnlawrence the file created by the vanilla creator:
make_vanilla_ncdata(filename=s3_testfile,
chunksize=(3, 3, 1), n=150)
so a 150x150x150 with no compression, no filters, chunking is horrendous indeed. The JSON file itself is only 45M on disk, just a wee smaller than the actual netCDF data file (which is ~50M on disk)
Is there any chance we are reading that json file every time we process a chunk, or do a test? Are those the timings per chunk, or per test, or per set of tests?
Is there any chance we are reading that json file every time we process a chunk, or do a test? Are those the timings per chunk, or per test, or per set of tests?
per test, see the test module and see the GA action under the "Analyze S3 and local test performance" step - I think I am now understanding very well what's the hap, reading kerchunk's hdf5.py module, one sec will post findings
OK running around in circles yelling "my hair is on fire" (Dexter's Laboratory reference, for the older people) phase over - understanding what the heck's going on now:
So the test I am using is this one for a local storage since I can't make use of the S3 container, but things are pretty clear already:
kerchunk's hdf.py is the one that has SingleHdf5ToZarr
class and this is the one that's being used to open the file and convert it to a Zarr reference file. It "translates" the file from HDF5 to Zarr via a translate(self)
method that does two main things:
self._transfer_attrs(self._h5f, self._zroot)
(x y z): base64(offset, size)
- this is done via self._h5f.visititems(self._translator)
where _translator
is a callable, and visititems
is a h5py method that takes the callable and returns what the callable asks for, see https://docs.h5py.org/en/stable/high/group.html#h5py.Group.visititems - this results in the creation of cinfo
- a massive dict that contains all manners of UTF8 metadata and a large collection of chunk key-val pairs of (x y z): base64(offset, size)
(x y z): base64(offset, size)
(in binary)Running the test on the file produces this timing table:
Translation begins
Transfer attrs took: 0.0016644001007080078
Running translator
cinfo (part of visititems) took: 1.9593455791473389
Transfer attrs took: 0.0005290508270263672
chunk info storaging took: 1.9480032920837402
Running translator
cinfo (part of visititems) took: 5.841255187988281e-05
Transfer attrs took: 0.0005819797515869141
chunk info storaging took: 1.71661376953125e-05
Running translator
cinfo (part of visititems) took: 4.1484832763671875e-05
Running translator
cinfo (part of visititems) took: 6.198883056640625e-05
Transfer attrs took: 0.00061798095703125
chunk info storaging took: 1.7642974853515625e-05
Running translator
cinfo (part of visititems) took: 8.726119995117188e-05
Running translator
cinfo (part of visititems) took: 5.2928924560546875e-05
Transfer attrs took: 0.0005898475646972656
chunk info storaging took: 1.621246337890625e-05
Running translator
cinfo (part of visititems) took: 4.00543212890625e-05
Visit items took: 4.660032510757446
Trsnafer attrs and visit items took: 4.661792993545532
json encode took: 0.13840699195861816
Translate took: 4.800270318984985
Translator ran 7 times for 7 HDF5 groups - it seems that just the first group is the big one - at any rate, visiting items is absolutely dominant in time for this particular file - a nasty recursive operation that's done on the open file; I don't know how this scales to S3 hosted files, but looking at the duration of tests above this operation is VERY slow for S3 files (20x slower than for local POSIX)
The way _translator
is calling _storage_info
is probably not optimized for object stores, note that even the HDF5 open file call should use the ros3
kwarg for S3 files.
In conclusion: phew, no actual netCDF4 file's actual data is transferred, but the use of kerchunk's HDF5 to Zarr mapper either needs to be fixed upstream (at least for S3 data) or needs to vanish altogether, bc the performance is abysmal
holy mother of Jeebus! Rechunking from (3, 3, 1) to (10, 10, 10) for the same 150x150x150 file gets us HUGE speedups and memory saves: see https://github.com/valeriupredoi/PyActiveStorage/actions/runs/6312570050/job/17138886865?pr=146 -
Active
(end-to-end): 87M / 0.55sSingleHDF5ToZarr
: 60M / 0.17sActive
(end-to-end): 98M / 0.14sSingleHDF5ToZarr
: 60M / 0.05ssame results when running locally, now with detailed Kerchunk timing:
Result
a = {'n': 8, 'sum': array([[[1355404.]]])}, axis = None, dtype = None, out = None, keepdims = False
Kerchunk timing
Kerchunking file /home/valeriu/PyActiveStorage/tests/s3_exploratory/test_data/test_bizarre.nc with variable data for storage type None
Storage type None
Translation begins
Transfer attrs took: 0.001699686050415039
Running translator
cinfo (part of visititems) took: 2.6562392711639404
H5 object Kind f
Transfer attrs took: 0.0005955696105957031
chunk info storaging took: 3.091123580932617
Running translator
cinfo (part of visititems) took: 6.866455078125e-05
H5 object Kind i
Transfer attrs took: 0.0007214546203613281
chunk info storaging took: 2.0742416381835938e-05
Running translator
cinfo (part of visititems) took: 8.440017700195312e-05
H5 object Kind f
Running translator
cinfo (part of visititems) took: 7.963180541992188e-05
H5 object Kind i
Transfer attrs took: 0.0007498264312744141
chunk info storaging took: 2.3365020751953125e-05
Running translator
cinfo (part of visititems) took: 6.747245788574219e-05
H5 object Kind f
Running translator
cinfo (part of visititems) took: 6.818771362304688e-05
H5 object Kind i
Transfer attrs took: 0.0006330013275146484
chunk info storaging took: 2.2649765014648438e-05
Running translator
cinfo (part of visititems) took: 5.054473876953125e-05
H5 object Kind f
Visit items took: 6.826631307601929
Trsnafer attrs and visit items took: 6.828451633453369
json encode took: 0.20699334144592285
Translate took: 7.035499334335327
Result
a = {'n': 8, 'sum': array([[[1355404.]]])}, axis = None, dtype = None, out = None, keepdims = False
Kerchunk timing:
Kerchunking file /home/valeriu/PyActiveStorage/tests/s3_exploratory/test_data/test_bizarre.nc with variable data for storage type None
Storage type None
Translation begins
Transfer attrs took: 0.005974292755126953
Running translator
cinfo (part of visititems) took: 0.028586149215698242
H5 object Kind f
Transfer attrs took: 0.0006480216979980469
chunk info storaging took: 0.015633106231689453
Running translator
cinfo (part of visititems) took: 7.009506225585938e-05
H5 object Kind i
Transfer attrs took: 0.0007061958312988281
chunk info storaging took: 1.7404556274414062e-05
Running translator
cinfo (part of visititems) took: 5.125999450683594e-05
H5 object Kind f
Running translator
cinfo (part of visititems) took: 6.389617919921875e-05
H5 object Kind i
Transfer attrs took: 0.0004954338073730469
chunk info storaging took: 3.24249267578125e-05
Running translator
cinfo (part of visititems) took: 3.910064697265625e-05
H5 object Kind f
Running translator
cinfo (part of visititems) took: 4.3392181396484375e-05
H5 object Kind i
Transfer attrs took: 0.0004894733428955078
chunk info storaging took: 1.52587890625e-05
Running translator
cinfo (part of visititems) took: 3.790855407714844e-05
H5 object Kind f
Visit items took: 0.07573843002319336
Trsnafer attrs and visit items took: 0.08198189735412598
json encode took: 0.0013606548309326172
Translate took: 0.08354067802429199
Chunks (75, 75, 75) doubles speedup :clown_face:
So more test results: in latest test I am creating a 500x500x500 data file with chunks=(75, 75, 75) - the file (which I also created locally) is 1.1G in size. Local AS takes 0.1s and needs 100M of RES memory; S3 AS takes 0.28s and needs 83M of RES memory - numbers check out since I ran local AS on my laptop and it took 0.3s (slightly longer since I am printing stuff to screen - printing takes a lot in these very short cases) and needed 100M of RES mem :partying_face:
@markgoddard I am afraid of abusing the GA machine with large files (I think about 2G is the most I can use), maybe it'd be good if you created a fairly hefty file (10G maybe?) and put it on Minio and tested with it, when you get a bit of time - chunking is absolutely paramount so maybe a (200, 200, 200) chunks for that? :beer:
chaps @bnlawrence @davidhassell @markgoddard - I'd like to merge this now so we have the tests available in main
(for deliverable and other formal reasons, but also, so we can continue looking at performance); incidentally, this PR also cleans up the way we call SingleHdf5ToZarr
from Kerchunk (it actually increases performance this way!) - could I ask one of yous to have a looksee and give me a review please - don't have to be super careful review, this doesn't break anything :grin:
I'll merge this for now - we can always PR improvements :grin:
Description
@markgoddard found out in #140 that there is some hefty data transfer at the point where
kerchunk.SingleHdf5ToZarr
is called and run; that's not clear to me if that is actual data (as in, netCDF file'sdata
member) transferred, or it's metadata, or it's something else; but I noticed that the SingleHdf5ToZarr API docs mention the need for astorage_options
dict to be passed to fsspec when the file is a string, so I thought we'd create that, pass it, and use it, see if it makes any difference at all.Closes #140 (well, at least attempts to, if testing proves otherwise, we'll take it from there)
Before you get started
Checklist