SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
635 stars 283 forks source link

bbox extract #77

Closed isedwards closed 6 years ago

isedwards commented 12 years ago

This ticket proposes an enhancement to allow a region to be extracted by specifying a bounding box.

Currently, users may attempt to extract a region from a global dataset using constraints:

latc = iris.Constraint(latitude=lambda cell: minlat <= cell <= maxlat)
lonc = iris.Constraint(longitude=lambda cell: minlon <= cell <= maxlon)
region = cube.extract(latc & lonc)

However, the output will retain the circular property, which they would have to change manually: region.coord('longitude').circular = False

esc24 commented 12 years ago

I'm content with the Constraint syntax for this, although I'm open to suggestions for a complementary api. However, I suggest that this behaviour of maintaining the circular attribute as True in this case is a bug.

bblay commented 12 years ago

I believe extraction should turn off circularity but look at this test:

>>> c0.coord("longitude").circular
True
>>> x = c0.extract(iris.Constraint(longitude=lambda lon: -90 < lon < 90))
>>> x.coord("longitude").circular
False
pelson commented 12 years ago

I'm content with the Constraint syntax for this

Really? I can come up with some far nicer syntaxes. For instance:

region = cube.extract(iris.constraint.BBox(longitude=[0, 90], latitude=[0, 45]))

* please note, I have not actually thought about this, and it could uncover all sorts of problems, but in my eyes, it is going to be better than the current extract mechanism for spatial extractions.

pelson commented 12 years ago

Woops. Wrong button. Sorry.

esc24 commented 12 years ago

Really? I can come up with some far nicer syntaxes.

First off it's not a competition. Secondly, I'm content (not delighted) with the current syntax because it provides the required functionality in a generic way. Thirdly, the syntax you propose is unclear as to whether the interval is inclusive or exclusive.

bblay commented 12 years ago

I humbly suggest that was a reasonable comment. It's fair for the various options to compete with each other.

I open myself up to public flaying with this suggestion, again not thought about deeply, but motivated by "Iris make it easy"

region = cube.extract(latitude=(0,90), longitude=(0,45))
region = cube.extract(x=(0,90), y=(0,45))

Of course, the documentation would describe inclusivity.

esc24 commented 12 years ago

@pelson : With fresh eyes this morning, you are right. The current syntax looks overly complex for a simple crop type operation. Let the competition begin...

pelson commented 12 years ago

cube.extract(x=(0,90), y=(0,45))

That rules out any other keywords to extract (strict for instance). Not to mention that it is confusing if I wanted to extract the x points 0 and 90 only (and nothing in between).

bblay commented 12 years ago

@pelson ok, good, next up:

region = cube.extract(range={"latitude":(0,90), "longitude":(0,45)})
region = cube.extract(range={"y":(0,90), "x":(0,45)})
region = cube.extract(points={"y":(0,90), "x":(0,45)})
bblay commented 12 years ago

or extract region and points could be separate funks

pelson commented 12 years ago

region = cube.extract(range={"latitude":(0,90), "longitude":(0,45)})

is almost identical to

cube.extract(iris.constraint.BBox(longitude=[0, 90], latitude=[0, 45]))

except that my BBox is easier to specialise in the future to support specific coordinate system detail i.e.:

cube.extract(iris.constraint.GeoBBox('espg:4236', longitude=[0, 90], latitude=[0, 45]))

Before we get carried away though, are we actually going to do this? And in what time scale?

ghost commented 12 years ago

Having been pointed in this direction from #83 I thought I'd chuck in my two pennies worth to this discussion.

From a readability perspective the lambda syntax, while being quite new to me before dipping my toe into iris, is a very clear way of specifying a constraint, something which the discussion above demonstrates is not quite as clear when using pairs of numbers for latitude and longitude.

The discussion also appears to be looking at variants on the syntax for the UKMO IDL routine pp_extract, i.e. specify the field to extract from and the four corners of the region to extract. It would make many users switching from UKMO/IDL/PP land (probably a significant fraction of your early user community) feel at home if a familiar syntax was available.

My (admittedly mild) preference would be to stick with the constraint in the example at the top of this page, with the slight modification that I would like to have both latitude and longitude ranges included in the same constraint (this may already be possible - I haven't checked).

In case it is still an issue, I am not keen to have to worry about the .circular property of a coordinate.

pelson commented 12 years ago

I am not keen to have to worry about the .circular property of a coordinate.

Agreed. Thanks for your perspective on this, I think it is really valuable and well thought out.

My feeling on this is, as long as the latitude and longitude are defined independently, I think it would be incorrect to change the circular flag. As soon as one knows that you have a bounding box (i.e. two dimensions or more) in a limited domain, then it makes sense to set the circular flag to false. This is the nub of the issue with the current means of independent lambda constraints IMHO.

bblay commented 12 years ago

Isn't it true that circular isn't part of CF, and also that it can be derrived? I'm not sure we should even have it as a fundamental attribute of a Coord.

Regarding bounding box extraction, it should sometimes leave us with a circular coord. If circularity becomes derrived then we won't have to worry about that.

pelson commented 12 years ago

Regarding bounding box extraction, it should sometimes leave us with a circular coord.

No, it shouldn't. If you have a bounding box it is not the full circular longitude extent...

cpelley commented 12 years ago

but if the nearest longitudinal point to that specified by the bounding box is the extreme of the field then should it not still be global? or are you saying that the act of specifying a longitudinal bound should turn it regional regardless?

I do like the idea of deriving that attribute since that way it seems to me to avoid possible future bugs where correct assignment of its status may or may not be overlooked.

bblay commented 12 years ago

...should it not still be global?

This was the "sometimes" I was referring to. It only seems correct to derive it.

pelson commented 12 years ago

It only seems correct to derive it.

Except you can't derive it without some heuristic - admittedly it is the same heuristic that you would need for a CF data set. The only way I can see that one might be able to get away with it is to say that it is the coordinate reference system which determines the "circularity" - a geodetic coordinate system is always "circular" whereas a lat lon projection (plate carree) is never "circular".

bblay commented 12 years ago

Yes, you'd have to check for something like:

bblay commented 12 years ago

(I wish you could hide the evidence that you'd accidentally hit the close button)

ajdawson commented 11 years ago

I'm going to resurrect this old discussion as it seems an appropriate place to express my view:

One of the main reasons I have been using the CDAT (Climate Data Analysis Tools) package for the last 5 years is the ease with which I can extract arbitrary regions from my data. For instance if I have data with a longitude coordinate defined over the range [0, 360) and I ask for the region [-80, 40] CDAT gives me a variable with that range. I don't have to care what range my (global, circular) longitudes are actually defined over, I just get what I asked for.

I think it is essential that Iris can do something like this too. Manually shifting grids prior to extraction becomes extremely tiresome, particularly if you want your program to work with data from multiple sources that define the longitude dimension differently...

In my opinion using iris.Constraint with a lambda function is a very neat way of doing extractions. In my ideal world the result of doing

lon_constraint = iris.Constraint(longitude=lambda l: -80 <= l <= 40)
lat_constraint = iris.Constraint(latitude=lambda l: 30 <= l < 90)
cube_region = cube.extract(lon_constraint & lat_constraint)

on cube whose longitudes are global (circular) over the range [0, 360) should be a cube defined over the given ranges, simple as that. I cannot stress enough how important it is that this should just happen automatically, regardless of the domain over which the (circular) longitude dimension is defined. Implementing an equivalent to numpy.roll for cubes is not the ideal solution.

rhattersley commented 10 years ago

I'd like to submit a PR to address this issue so I've posted to the corresponding iris-dev discussion topic.

ajdawson commented 6 years ago

The Cube.intersection method seems to cover the requirements set out here, closing.