Closed geordgez closed 7 years ago
Thanks @geordgez, looks good to me. I've added @aashish24 as a reviewer for final approval.
@aashish24:
@mbertrand and I just had a discussion about how to handle NoData values within raster band layers. How should we fill out NoData in the NumPy array output, i.e., should we use NaN
, 0
, the original NoData value, or some other value?
NoData value
I would go with this order. Fill with original NoData value, if none found or none given from the user, then use NaN as the value. I would avoid 0 as 0 could be a valid value.
@geordgez @mbertrand see this issue https://github.com/OpenDataAnalytics/gaia/issues/60
Thanks for the info @aashish24! I just checked to make sure that the most recent commit (54535f1
) includes the new parameters old_nodata
and new_nodata
for the read
function with the functionality below in each situation (8 unique cases are covered).
Is this consistent with the desired NoData functionality?
Situations:
(old_nodata is None) and (srcband.GetNoDataValue() is None)
or
(new_nodata is None)
(5 cases)
srcband_ar = srcband.ReadAsArray()
are replaced. Original NoData values remain unchanged.(old_nodata is not None) and (new_nodata is not None)
(2 cases)
old_nodata
in the returned array srcband_ar = srcband.ReadAsArray()
are replaced with new_nodata
(old_nodata is None) and (srcband.GetNoDataValue() is not None) and (new_nodata is not None)
(1 case)
srcband.GetNoDataValue()
in the returned array srcband_ar = srcband.ReadAsArray()
are replaced with new_nodata
Full function signature:
read(
self,
as_numpy_array=False,
as_single_band=True,
old_nodata=None,
new_nodata=None,
epsg=None
)
@geordgez thanks for putting it together very well. I do not quite understand this:
(old_nodata is not None) and (new_nodata is not None) (2 cases)
How can we have two possibilities?
@geordgez also any idea on how we can define a structure behind conversions?
@aashish24 Sorry I should clarify, 8 cases are for the combinations of old_nodata
, new_nodata
, and srcband.GetNoDataValue()
taking on values of None
or not None
.
So the combination of (old_nodata is not None)
and (new_nodata is not None)
includes both possibilities (2 cases, given both old_nodata
and new_nodata
are None
) that (srcband.GetNoDataValue() is None)
and (srcband.GetNoDataValue() is not None)
.
In terms of defining a structure behind conversions, do you mean for going both ways between raster Tiff and NumPy?
In terms of defining a structure behind conversions, do you mean for going both ways between raster Tiff and NumPy?
Sorry what I meant is if we should define a base class that defines an API for conversions? Since currently we have a module level function but in the future we may need conversion to more types (pandas df for example)
Sorry what I meant is if we should define a base class that defines an API for conversions? Since currently we have a module level function but in the future we may need conversion to more types (pandas df for example)
Understood--I think having a conversion class may be a good idea depending on the types of possible conversions within the program. On my end, I need to familiarize myself with the file formats and conversions that exist (or that we may want in the future).
On the one hand, I think it will be helpful to consolidate all the conversions. On the other hand, I want to avoid an ambiguity where a user wants to make a standard conversion (e.g., from NumPy to Pandas) and is unsure whether to use the converter API or to use the standard Pandas function calls.
On the one hand, I think it will be helpful to consolidate all the conversions. On the other hand, I want to avoid an ambiguity where a user wants to make a standard conversion (e.g., from NumPy to Pandas) and is unsure whether to use the converter API or to use the standard Pandas function calls.
sure. Did you get a chance to think more about it? I am thinking having a converted API would be nice since I am expecting that we will need that quite a bit in the future.
@aashish24 Having given some thought, I think the converter API would be a good idea although I'd want to verify some aspects with you and the team. My initial questions/thoughts:
See #92 and #60
@geordgez will reply later today.
@geordgez thinking some more on this, lets get merged this one and then we can make another pass on it.
Which formats would the API cover? Would it mainly be the raster formats found in gaia.formats?
for now raster, in the future, we should also cover vector (for example geotiff to vector format).
Based on what I've seen so far and with the functionality added by @andrenguyen-bah and @chuehlien, NumPy seems to be a "universal" intermediate format for conversion of typical image files since it can support all of the output image types. Would the API be built around intermediate stages with NumPy?
I think that should be fine. I think it is easy to convert from NumPy to other types such as pandas DF.
Sorry if this is a dumb question: are there any instances where we would be converting between Raster, Feature, or Vector classes? Or are conversions always between formats within each class?
should be able to go between types. See here: https://docs.qgis.org/2.6/en/docs/training_manual/complete_analysis/raster_to_vector.html
@geordgez thoughts?
Also, can you squash commits please?
Thanks,
@aashish24 Sorry for the delay, I got dragged away and likely won't be available for a few weeks--I definitely think going between types should be a capability of the API. Let met know if you need more squashes to commits or amendments to commit comments.
My main concern going forward is how we'd be expanding the API for objects in memory vs objects on disk and the relationship between the two object types, especially with how they interact with NumPy. Converting to raster formats through NumPy should be easy--correct me if I'm wrong--but I think we need to be careful with the process for converting to vector formats.
Short description
Added option to output data as NumPy array in RasterFileIO in
gaia/geo/geo_inputs.py
.Example Usage
Default functionality matches the NumPy array calls in the
docs/examples/gaia_processes.ipynb
notebook (example below):With the new function the second line can be rewritten:
Parameter defaults & descriptions
as_single_band
parameter supports output of 3D NumPy array for multidimensional raster datasets (default is a 2D slice of the first raster band).as_single_band=False
, we can get the equivalent layer as above by indexing into the first layer of the 3D NumPy array:new_nodata
parameter supports customization of NoData values:new_nodata=None
(default value), attempts to get the NoData values in each band in the output NumPy array. If no NoData value exists, sets NoData value to 0.