This PR is mostly intended for large ImageCollection optimizations really.
The measured performance benefits are substantial in terms of loading times and memory footprint of large image collections:
before
real 47m44.168s
after
real 7m42.795s
per-standardizer row we save another 5K of data, or approximately 50% of memory we used to take. I wanted to test the impact on the standardization times, since those must not be significantly impacted, but someone is currently really thrashing the disks so I can't get reliable results. For 3 datasetrefs, should be on the order of a second, I got the following results:
old
Time to standardize A0a collection: 14.083693213993683
Time to standardize A0a collection: 12.05290256603621
Time to standardize A0a collection: 16.573551627923734
new
Time to standardize A0a collection: 16.573551627923734
Time to standardize A0a collection: 4.487202679971233
Time to standardize A0a collection: 13.363859862089157
so I'm going to have to repeat these tests to see what's the damage. I expect a slight increase in time because of the serialization of the WCS takes a bit more steps now, but I don't think it should be that bad.
Speedups include:
not loading all BBOX and WCS as objects at IC init time
adding the ability to disable lazy loading
serializing BBOX as columns to save on data volume of IC
serializing WCS as dict to save on size of IC
serializing WCS and config with the smallest separators possible
trimming padding and spaces from the serialized objects
Changes to BBox and WCS handling:
WCS is now a mandatory part of the standardized data
Location was removed as mandatory part of standardized data
BBox is now unravelled as columns instead of dicts.
BBox is now defined as on-sky coordinates of the 4 chip corners of the chip, clockwise, and center pixel
WCS is not longer part of the standardized metadata WCS is now serialized by the IC.
WCS is serialized as a dictionary
WCS remains an attribute of the standardizers
Changes to IC:
lazy loading can be turned of at instantiation time now. Lazy loading only happens if _standardized list exists.
standardizers, wcs and bbox properties are now iterators
get_wcs and get_bbox are now getters that do not support lazy loading
get_standardizer continues to support lazy loading
This PR is mostly intended for large ImageCollection optimizations really. The measured performance benefits are substantial in terms of loading times and memory footprint of large image collections:
per-standardizer row we save another 5K of data, or approximately 50% of memory we used to take. I wanted to test the impact on the standardization times, since those must not be significantly impacted, but someone is currently really thrashing the disks so I can't get reliable results. For 3 datasetrefs, should be on the order of a second, I got the following results:
so I'm going to have to repeat these tests to see what's the damage. I expect a slight increase in time because of the serialization of the WCS takes a bit more steps now, but I don't think it should be that bad.
Speedups include:
Changes to BBox and WCS handling:
Changes to IC:
standardizers
,wcs
andbbox
properties are now iteratorsget_wcs
andget_bbox
are now getters that do not support lazy loadingget_standardizer
continues to support lazy loading