FZJ-IEK3-VSA / glaes

Geospatial Land Availability for Energy Systems
MIT License
51 stars 25 forks source link

Feature request: Include list of raster types #2

Closed euronion closed 4 years ago

euronion commented 4 years ago

I currently do not see a way to include (= not exclude) as list of non-consecutive raster types using excludeRasterType(...) or any other function for that matter.

Excluding a list of non-consecutive raster types can be easily done by iterating the list. Including a list of consecutive raster types can be realised using the invert=True switch. But using invert=True to only consider certain raster types with a list of non-consecutive values does not seem possible.

Is it possible to include such a feature?

NB: It seems to have been possible in the past (I presume unintentionally), e.g. with this fork (https://github.com/PyPSA/glaes) used in pypsa-eur.

euronion commented 4 years ago

I guess it was possible due to the custom geokit version used there as well. I found the PR in the geokit (https://github.com/FZJ-IEK3-VSA/geokit/pull/1) already trying to make this happen.

sevberg commented 4 years ago

My only concern with such a function would be in a situation where someone inputs an iterable of exactly two values. For example:

ec.excludeRasterType( "path/to/file.tif", value=[5,7])

In this case it would be ambiguous if the user means to exclude exactly the values 5 & 7, or if they want to exclude the range 5-7. Maybe the solution would be to say a list implies exact values, and a tuple implies a range. But this seems unintuitive to me, and I'm also not sure what to do with derived iterables such as numpy arrays or generators.

Thoughts?

sevberg commented 4 years ago

Maybe another way to handle the input would be to always expect either a numeric value (for an exact indication) or a string. Then mathematical notation could be used. E.g.:

euronion commented 4 years ago

I find the a tuple (lower, upper) or even slice(start, stop) to be intuitive for ranges and lists of values for single values to exclude. I think parsing mathematical notation is tempting but makes the code more complicated than necessary.

Or maybe more pythonic: keep value= for a single value / a range of values and introducevalues= [...] for a list of values to consider.

sevberg commented 4 years ago

I think parsing mathematical notation is tempting but makes the code more complicated than necessary.

I can see your point here. But do you think it would be SO bad? I think doing this would fit well into a newer feature I've recently added which you may not be aware of. The purpose of this is to allow the complete description of the steps in an eligibility analysis to come from a CSV file or database (anything which translates to a pandas dataframe, actually). This allows various constraint scenarios for multiple geo-political regions to be stored and altered as git- and user-friendly text files.

Or maybe more pythonic: keep value= for a single value / a range of values and introducevalues= [...] for a list of values to consider.

This doesn't quite fit from my perspective, since a single range indication also involves multiple explicit values, and hence I could imagine someone making the mistake of saying values=(lower,higher), which would only indicate the explicit values, when they actually meant to have said value=(lower,higher). This would be disastrous if they were indicating from a float-typed raster file and weren't being careful to check that their output makes sense. (Unfortunately, I've seen this often :( )

euronion commented 4 years ago

Sounds like a helpful feature when doing a lot of specific or high-resolution eligibility analysis. Nice!

Confusing value and values is indeed a bit risky. I guess the mathematical notation would be an elegant solution. Not sure if there is a standard library for that purpose though.

sevberg commented 4 years ago

Not sure if there is a standard library for that purpose though

No need for an additional module. regular (not 'renewable', uff) expressions to the rescue!

I wanted to test the idea, so I've pushed a temporary branch "math_notation" with a change that makes the indicateValues function in GeoKit behave in the way we discussed. I actually like it quite a bit, and could imagine incorporating it permanently. Anyway, before I go on I would first make sure it fits your use case. Would you mind checking it out for me? I'm also open to other approaches/suggestions you might have.

The relevant update can be found here And the docstring is updated here

euronion commented 4 years ago

Wow, that was fast! I didn't think of regular expression, smooth approach (Leaving this here).

On the first glimpse it is looking good: grafik

grafik

I didn't test all combinations of possible value, only half-open range "[--50]", list "[44,255]" and a mix of of closed range and list "[12-20],21,22,[21-32]".

I guess the thing tickling my nose the most (which is a good thing, because it is a minor thing) was the use of [0-9] instead of \d in the regex.

value_re = re.compile(r"(?P<range>(?P<open>[\[\(])(?P<low>-?\d+\.?\d*)?-(?P<high>-?\d+\.?\d*)?(?P<close>[\]\)]))|(?P<value>-?\d+\.?\d*)")
sevberg commented 4 years ago

I guess the thing tickling my nose the most (which is a good thing, because it is a minor thing) was the use of [0-9] instead of \d in the regex.

I had totally forgotten about this notation. Thanks for the tip!

I also noticed that the previous regular expression would not have matched a number which doesn't lead with a digit. For example, "0.72" would be okay, but ".72" would not. So I added a fix to both of these issues

euronion commented 4 years ago

Finishing touches (|\d+ becoming |\d+\.?):

r"(?P<range>(?P<open>[\[\(])(?P<low>[-+]?(\d*\.\d+|\d+\.?))?-(?P<high>[-+]?(\d*\.\d+|\d+\.?))?(?P<close>[\]\)]))|(?P<value>[-+]?(\d*\.\d+|\d+\.?))"

else you would match "1,2,.2,[3-],[3.3-],[.3-],[-3],[-3.3],[-.3]", but not 3., [3.-], [-3.].

And while is doesn't make contextual sense in this method, the range "[-]" is currently a valid input which should break the code (didn't test it). To capture this case, you could elif m['low'] or m['high']:

sevberg commented 4 years ago

Finishing touches (|\d+ becoming |\d+.?)

Nice catch. I've updated the line.

About the "[-]" case, I can imagine an uncommon scenario where someone whats to indicate every pixel in a raster file with any finite value (i.e. not 'NaN'). It's odd for sure, but I would nevertheless elect to keep the feature just in case.

sevberg commented 4 years ago

Hello again @euronion. If the current state of this update works for you, then I would go ahead and merge it into the master

euronion commented 4 years ago

Sure. No objections from my side and AFAICT working smoothly for me.

Thanks a lot!

sevberg commented 4 years ago

The change is incorporated into master now with version 1.1.8