PermafrostDiscoveryGateway / viz-staging

PDG Visualization staging pipeline
Apache License 2.0
2 stars 4 forks source link

Convert input data CRS to config's `input_crs` only if there is no CRS in input data #26

Open julietcohen opened 1 year ago

julietcohen commented 1 year ago

In the config, there is an option to set input_crs (see here) which was intended to be used when the input data lacks CRS information, which was the case with some early ice wedge polygon data. However, the way set_crs() is currently configured here in TileStager.py, the CRS of input data is set to the input_crs if the value in this in the config is not None. To be clear, the way the operation is set at the moment, the data is not transformed. See the documentation for geopandas set_crs() here.

We need the data to only be set to the value of input_crs when it is not None and the input data does not already have CRS info.

robyngit commented 1 year ago

I was thinking about this @julietcohen... Do you think there will ever be a scenario where we would want to correct an existing CRS in a dataset? set_crs would already work for that case, but maybe we would want an option like replace_crs (True/False) to indicate when the given CRS needs to be corrected and when it needs to be set only for files missing that info? or maybe that would be option overload!

julietcohen commented 1 year ago

@robyngit That's a good idea, it does seem likely that one day we will receive data with incorrect CRS information. I have not encountered this before, but maybe instead of including another option like replace_crs, we could include a check for the actual CRS of the geometry using geodataframe.geometry.crs and compare it to what is returned by geodataframe.crs. Then if they are not the same, we use set_crs() to correct the geodataframe CRS info

julietcohen commented 1 year ago

Seems like the output of both geodataframe.geometry.crs and geodataframe.crs are changed by set_crs() even though set_crs() doesn't transform the data ~so we would need anther way to check the actual CRS of the geometries and not just the metadata~ and there's no way to check the CRS of geometries besides the metadata

julietcohen commented 3 months ago

To clarify: In the current code, here are the 4 possible scenarios and their outcome:

Does input data already have a CRS set ? Does input_crs config option have a value besides None? Result
yes yes data is set to the CRS defined as input_crs option in the config, then transformed to the CRS of the TMS
yes no data is transformed to the CRS of the TMS
no yes data is set to the CRS defined as input_crs option in the config, then transformed to the CRS of the TMS
no no data is transformed to the CRS of the TMS

So considering options 1 and 3, if input_crs has a value, then the input data is set to that CRS regardless if the input data already has a CRS. Perhaps this was the intentional purpose of this option in order to correct an incorrectly set CRS. If not, then the code should be adjusted.