Closed julianblue closed 1 year ago
Thanks for reporting @julianblue. This maybe seems like https://github.com/numpy/numpy/pull/22004:
Ragged array creation will now always raise a ValueError unless dtype=object is passed. This includes very deeply nested sequences. https://numpy.org/doc/stable/release/1.24.0-notes.html
However I'm confused, since we do already pass dtype='object'
(note quotes; I hope removing them doesn't fix this, but I guess it's worth trying). I added that to avoid a DeprecationWarning from NumPy, so I'm a little surprised this has suddenly broken with no warning.
I won't be able to work on this for a few weeks. For now, pinning NumPy is a fine workaround; ideally we'd make a release doing that. If anyone wants to take a look at this sooner, please post what you find here.
Do you have a reproducible example @julianblue? I can't reproduce using stackstac main
and this simple example:
In [17]: import stackstac, pystac_client
In [18]: client = pystac_client.Client.open("http://planetarycomputer.microsoft.com/api/stac/v1/")
In [19]: ic = client.search(collections="sentinel-2-l2a", max_items=2).item_collection()
In [20]: stackstac.stack(ic, assets=["B02", "B04"])
Out[20]:
<xarray.DataArray 'stackstac-eb5fcb5108bcbbdc7f4efca13682f66b' (time: 2,
band: 2,
y: 30978,
x: 20982)>
dask.array<fetch_raster_window, shape=(2, 2, 30978, 20982), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/44)
* time (time) datetime64[ns] 2022-12-24...
id (time) <U54 'S2B_MSIL2A_20221224...
* band (band) <U3 'B02' 'B04'
* x (x) float64 5e+05 ... 7.098e+05
* y (y) float64 2e+06 ... 1.69e+06
s2:not_vegetated_percentage (time) float64 0.3278 0.0
... ...
gsd float64 10.0
proj:shape object {10980}
common_name (band) <U4 'blue' 'red'
center_wavelength (band) float64 0.49 0.665
full_width_half_max (band) float64 0.098 0.038
epsg int64 32719
Attributes:
spec: RasterSpec(epsg=32719, bounds=(499980.0, 1690240.0, 709800.0...
crs: epsg:32719
transform: | 10.00, 0.00, 499980.00|\n| 0.00,-10.00, 2000020.00|\n| 0.0...
resolution: 10.0
@TomAugspurger I did not specify the assets to pull all available bands. Specifying the assets does NOT cause the error to be raised @gjoseph92 .
stack = stackstac.stack(itm_coll, resolution=10)
@TomAugspurger , @gjoseph92 ,
the code fails in the metadata conversion for proj:transform (not present in the preview band)
[[10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [60.0, 0.0, 399960.0, 0.0, -60.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [60.0, 0.0, 399960.0, 0.0, -60.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], None]
The none in the list should ideally not exist for raster band data and should be a transform like the other bands?
This causes
props_arr = np.squeeze(
np.array(
props,
# Avoid DeprecationWarning creating ragged arrays when elements are lists/tuples of different lengths
dtype="object"
if (
isinstance(props, _ourlist)
and len(set(len(x) for x in props if isinstance(x, (list, tuple))))
> 1
)
else None,
)
)
to set dtype=None and raises the error. Shouldn't the dtype in that case be set to "object" as well instead of none?
@julianblue can you give https://github.com/gjoseph92/stackstac/pull/195 a try? I don't have a computer available, so just did this on my phone and can't test it out. I feel like something along these lines might work (not sure if this is exactly the thing to do, though).
Hi @gjoseph92 ,
I hit this issue today on stackstac==0.4.3. It is happening when I use the query extension to filter out Landsat 7 items from the landsat-c2-l2 collection in the Planetary Computer STAC. Other operations that use the query
arg (e.g. filtering on eo:cloud_cover
) do not cause the failure.
Here is a reprex:
import planetary_computer
import pystac_client
import stackstac
COLLECTION = "landsat-c2-l2"
BBOX = (-91.67481115470213, 47.86318699029263, -91.52471085746416, 47.93118792159696)
DATE_RANGE = ["2022-07-01", "2022-07-31"]
EPSG = 5070
RESOLUTION = 30
ASSETS = ["red", "green", "blue", "qa_pixel"]
BOUNDS_5070 = (326000, 2771000, 337000, 2778000)
planetary_computer_catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
# include all items
stac_items = planetary_computer_catalog.search(
collections=[COLLECTION],
bbox=BBOX,
datetime=DATE_RANGE,
).item_collection()
print(f"there are {len(stac_items)} items if we do not apply any filters")
# works
landsat_stack = stackstac.stack(
items=stac_items,
assets=ASSETS,
epsg=EPSG,
resolution=RESOLUTION,
bounds=BOUNDS_5070,
xy_coords="center",
)
print(landsat_stack)
# filter out cloudy items
stac_items_less_cloudy = planetary_computer_catalog.search(
collections=[COLLECTION],
bbox=BBOX,
datetime=DATE_RANGE,
query={"eo:cloud_cover": {"lte": 50}},
).item_collection()
print(f"there are {len(stac_items_less_cloudy)} items if we exclude cloudy images")
# works
landsat_stack_filt = stackstac.stack(
items=stac_items_less_cloudy,
assets=ASSETS,
epsg=EPSG,
resolution=RESOLUTION,
bounds=BOUNDS_5070,
xy_coords="center",
)
print(landsat_stack_filt)
# exclude Landsat 7
stac_items_no_landsat_7 = planetary_computer_catalog.search(
collections=[COLLECTION],
bbox=BBOX,
datetime=DATE_RANGE,
query={"platform": {"neq": "landsat-7"}},
).item_collection()
print(f"there are {len(stac_items_no_landsat_7)} items if we exclude Landsat 7")
# fails with inhomogenous error
landsat_stack_filt = stackstac.stack(
items=stac_items_no_landsat_7,
assets=ASSETS,
epsg=EPSG,
resolution=RESOLUTION,
bounds=BOUNDS_5070,
xy_coords="center",
)
print(landsat_stack_filt)
Output:
there are 10 items if we include Landsat 7
<xarray.DataArray 'stackstac-5060345c8d58f7556b298868d9999b98' (time: 10,
band: 4,
y: 234, x: 368)>
dask.array<fetch_raster_window, shape=(10, 4, 234, 368), dtype=float64, chunksize=(1, 1, 234, 368), chunktype=numpy.ndarray>
Coordinates: (12/27)
* time (time) datetime64[ns] 2022-07-01T15:28:21.11...
id (time) <U31 'LE07_L2SP_027027_20220701_02_T1...
* band (band) <U8 'red' 'green' 'blue' 'qa_pixel'
* x (x) float64 3.26e+05 3.26e+05 ... 3.37e+05
* y (y) float64 2.778e+06 2.778e+06 ... 2.771e+06
platform (time) <U9 'landsat-7' ... 'landsat-9'
... ...
view:off_nadir int64 0
landsat:cloud_cover_land (time) float64 14.0 69.06 40.75 ... 100.0 97.19
gsd int64 30
raster:bands (band) object {'scale': 2.75e-05, 'nodata': ...
title (band) <U29 'Red Band' ... 'Pixel Quality As...
epsg int64 5070
Attributes:
spec: RasterSpec(epsg=5070, bounds=(325980, 2770980, 337020, 27780...
crs: epsg:5070
transform: | 30.00, 0.00, 325980.00|\n| 0.00,-30.00, 2778000.00|\n| 0.0...
resolution: 30
there are 6 items if we exclude cloudy images
<xarray.DataArray 'stackstac-b859f39f5ea1c12c247ef26512f01dbb' (time: 6,
band: 4,
y: 234, x: 368)>
dask.array<fetch_raster_window, shape=(6, 4, 234, 368), dtype=float64, chunksize=(1, 1, 234, 368), chunktype=numpy.ndarray>
Coordinates: (12/27)
* time (time) datetime64[ns] 2022-07-01T15:28:21.11...
id (time) <U31 'LE07_L2SP_027027_20220701_02_T1...
* band (band) <U8 'red' 'green' 'blue' 'qa_pixel'
* x (x) float64 3.26e+05 3.26e+05 ... 3.37e+05
* y (y) float64 2.778e+06 2.778e+06 ... 2.771e+06
platform (time) <U9 'landsat-7' ... 'landsat-9'
... ...
view:off_nadir int64 0
landsat:cloud_cover_land (time) float64 14.0 40.75 15.43 14.0 2.41 3.36
gsd int64 30
raster:bands (band) object {'scale': 2.75e-05, 'nodata': ...
title (band) <U29 'Red Band' ... 'Pixel Quality As...
epsg int64 5070
Attributes:
spec: RasterSpec(epsg=5070, bounds=(325980, 2770980, 337020, 27780...
crs: epsg:5070
transform: | 30.00, 0.00, 325980.00|\n| 0.00,-30.00, 2778000.00|\n| 0.0...
resolution: 30
there are 8 items if we exclude Landsat 7
Traceback (most recent call last):
File "/home/henry/workspace/hrodmn.dev/posts/hls/stackstac_inhomogeneous.py", line 72, in <module>
landsat_stack_filt = stackstac.stack(
File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/stack.py", line 311, in stack
*to_coords(
File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/prepare.py", line 500, in to_coords
accumulate_metadata.metadata_to_coords(
File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/accumulate_metadata.py", line 29, in metadata_to_coords
return dict_to_coords(
File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/accumulate_metadata.py", line 168, in dict_to_coords
np.array(
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.
FYI @julianblue @hrodmn, I just released a version with this fixed (0.4.4). Sorry for the long wait.
Thanks @gjoseph92!
Hi @gjoseph92 , I am getting an error when trying to stack data when the latest numpy version 1.24.0 is installed. All works fine with 1.23.5.