Closed lindsayplatt closed 2 years ago
Hayley kicked this off with #223
@jread-usgs some questions for you now that @hcorson-dosch and I are really in the guts of implementing the pipeline.
My current working query (here is the target and here is the function) returns a messy text file that requires munging – appears to be multiple csv-formatted file separated by headers all in one text file. Does that seem right or is that triggering alarm bells? I have started an initial function for parsing but want to make sure before I go much further. Just seems really funky.
Here is a snippet of what is loaded when I do readLines()
:
[1] "# evspsbl"
[2] ",ID1,ID10,ID100,ID11,ID12,ID13,ID14,ID15,ID16,ID17,ID18,ID19,ID2,ID20,ID21,ID22,ID23,ID24,ID25,ID26,ID27,ID28,ID29,ID3,ID30,ID31,ID32,ID33,ID34,ID35,ID36,ID37,ID38,ID39,ID4,ID40,ID41,ID42,ID43,ID44,ID45,ID46,ID47,ID48,ID49,ID5,ID50,ID51,ID52,ID53,ID54,ID55,ID56,ID57,ID58,ID59,ID6,ID60,ID61,ID62,ID63,ID64,ID65,ID66,ID67,ID68,ID69,ID7,ID70,ID71,ID72,ID73,ID74,ID75,ID76,ID77,ID78,ID79,ID8,ID80,ID81,ID82,ID83,ID84,ID85,ID86,ID87,ID88,ID89,ID9,ID90,ID91,ID92,ID93,ID94,ID95,ID96,ID97,ID98,ID99"
[3] "TIMESTEP,MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1),MEAN(kg m-2 s-1)"
[4] "1999-01-01T00:00:00Z,3.0479403E-6,6.2411355E-6,7.818796E-7,6.894544E-6,6.6264047E-6,9.675139E-6,1.0771746E-5,1.06765565E-5,9.926552E-6,9.447711E-6,8.660588E-6,7.797644E-6,3.4016837E-6,6.448473E-6,1.0036815E-5,9.920187E-6,1.1409653E-5,1.2325739E-5,1.1967138E-5,1.1022169E-5,1.0329134E-5,9.073887E-6,7.557237E-6,6.5496706E-6,6.2436175E-6,1.1667247E-5,7.667272E-6,1.1420567E-5,1.2252027E-5,9.922599E-6,8.150401E-6,1.0508522E-5,8.838302E-6,6.8397594E-6,9.156757E-6,5.3106673E-6,1.280317E-5,1.1962649E-5,1.3006744E-5,1.210488E-5,1.2293051E-5,1.2494677E-5,1.0339701E-5,7.629874E-6,6.2304475E-6,9.326908E-6,5.3670647E-6,1.1614002E-5,7.0229394E-6,8.721521E-6,1.1697765E-5,1.0465159E-5,1.0967345E-5,8.876884E-6,4.7448602E-6,3.0857123E-6,8.758055E-6,6.093673E-6,9.520644E-6,6.7338487E-6,7.648563E-6,8.90384E-6,1.0210867E-5,9.0875365E-6,6.840079E-6,4.2024503E-6,4.628265E-6,8.725951E-6,4.5186694E-6,7.3864794E-6,6.5283275E-6,5.5463806E-6,2.5749143E-6,5.3497924E-6,6.1883534E-6,4.4203553E-6,-1.3556412E-7,3.2933548E-7,8.003123E-6,5.900954E-7,4.469657E-6,5.2579603E-6,4.909238E-6,4.802666E-6,6.03942E-6,4.63963E-6,1.0305251E-6,-2.449164E-6,-1.5818581E-6,7.0171145E-6,6.270129E-7,7.2543207E-6,3.3003553E-6,5.93133E-6,4.379594E-6,4.930987E-6,2.7132191E-6,-8.9022785E-7,-2.9605308E-6,-1.6005098E-6"
[5] "1999-01-01T01:00:00Z,2.5578656E-6,4.4968933E-6,-1.7777161E-6,8.026662E-6,5.8394744E-6,4.2759384E-6,5.3513254E-6,6.5573186E-6,8.520235E-6,9.267275E-6,8.664139E-6,7.1855493E-6,3.6754116E-7,4.7927642E-6,9.465936E-6,9.900281E-6,8.269337E-6,7.657092E-6,8.669214E-6,9.780522E-6,9.8940345E-6,9.253336E-6,7.82955E-6,1.6866378E-6,5.8803425E-6,8.67844E-6,6.2116706E-6,1.0424121E-5,7.599634E-6,7.2886546E-6,7.539618E-6,1.0400472E-5,8.60406E-6,6.959848E-6,4.164287E-6,5.3128806E-6,1.1033225E-5,9.527728E-6,1.2241214E-5,1.1091293E-5,1.1527273E-5,1.1818796E-5,1.1724405E-5,8.783328E-6,6.129934E-6,6.1801793E-6,4.030096E-6,8.766548E-6,3.168272E-6,7.775944E-6,6.9698535E-6,7.530194E-6,1.09866405E-5,1.1252795E-5,8.776212E-6,6.0230163E-6,8.2925835E-6,3.6850608E-6,4.8326674E-6,2.9843206E-6,5.1575917E-6,2.3220541E-6,6.0555444E-6,7.6092315E-6,7.044763E-6,7.388853E-6,4.9958044E-6,8.974322E-6,2.988957E-6,6.127966E-6,4.2845613E-6,3.2826106E-6,4.6146974E-6,-2.6444893E-6,-1.0700569E-6,6.735114E-6,4.568504E-6,3.0298354E-6,7.652875E-6,7.706925E-7,7.4221134E-6,4.630271E-6,3.698081E-7,3.1633144E-6,1.8592492E-6,3.999646E-6,4.6821456E-6,3.5261576E-6,1.5989144E-7,5.867402E-6,-1.777977E-6,6.601029E-6,2.5276915E-6,5.216548E-6,2.499017E-6,1.9578079E-6,3.2954124E-6,3.814628E-6,3.4222915E-6,1.0094535E-6"
Setting aside for the moment the fact that we’ll have to consider how best to chunk the geoknife queries, we started discussing how we want to pass in the spatial information for a single geoknife query. Each geoknife query requires that the geometries be polygons, not points, so we can’t use lake centroids. We talked through one approach for creating the polygon geometries to use in the query that would 1) take advantage of GDP processing power, while 2) avoid downloading duplicate data. This would be hybrid approach between using actual lake geometries for the query (too detailed) and using a bounding box that encapsulates a set of lakes (too coarse). What are you thoughts on this approach?
Divide bounding boxes into two groups:
An alternative, simpler approach that would be akin to using lake centroids would be to map the lake centroids to our constructed GCM grid, and then use the grid polygons that contain lake centroids in our geoknife query. This would be less accurate for lakes that span multiple grid cells, but that may be an acceptable tradeoff, particularly if the area-weighting computation is expensive and/or doesn’t have much of an impact on the final computed set of drivers (not sure how much variation there is between a cell and its immediate neighbors).
Is the format of the data being returned correct?
Yes, that is the correct format coming from the GDP. But you can use build-in file parsers in geoknife
to read that dataset into a data.frame
. Instead of your code download(gcm_job, destination = out_file, overwrite = TRUE)
, you should be able to do:
my_data <- result(gcm_job)
# then write to file like this:
arrow::write_feather(my_data, out_file)
If using download
and readLines
, the data look like you've shown:
my_data <- geoknife(data.frame('point1'=c(-89, 46), 'point2'=c(-88.6, 45.2)), fabric = 'prism', wait = TRUE) %>% download(destination = tempfile()) %>% readLines(n = 5)
my_data
[1] "# ppt" ",point1,point2" "TIMESTEP,MEAN(mm/month),MEAN(mm/month)"
[4] "1895-01-01T00:00:00Z,55.8425,47.74" "1895-02-01T00:00:00Z,11.78,11.98"
result
gives you something more immediately usable:
my_data <- geoknife(data.frame('point1'=c(-89, 46), 'point2'=c(-88.6, 45.2)), fabric = 'prism', wait = TRUE) %>% result()
my_data
DateTime point1 point2 variable statistic
1 1895-01-01 55.8425 47.74 ppt MEAN
2 1895-02-01 11.7800 11.98 ppt MEAN
3 1895-03-01 9.6625 10.08 ppt MEAN
4 1895-04-01 39.6325 39.17 ppt MEAN
5 1895-05-01 127.2650 118.65 ppt MEAN
6 1895-06-01 82.4375 67.78 ppt MEAN
7 1895-07-01 70.5825 63.30 ppt MEAN
8 1895-08-01 58.0800 64.37 ppt MEAN
9 1895-09-01 126.1900 86.83 ppt MEAN
10 1895-10-01 16.9300 18.05 ppt MEAN
11 1895-11-01 34.3000 34.58 ppt MEAN
12 1895-12-01 46.1300 39.02 ppt MEAN
13 1896-01-01 27.1700 25.38 ppt MEAN
☝️ geoknife
's parsers return wide
format data.frames which are going to be a lot smaller than tall data.frames when there are lots of features. But you can pivot as needed.
What to use as query polygons?
Each geoknife query requires that the geometries be polygons, not points, so we can’t use lake centroids.
I'm not sure where this limit comes from, and you'll see a lot of point-based geoknife
examples in the documentation. (e.g., ?geoknife
, ?simplegeom
) and the vignettes. Lat/lon pairs can be used, although there isn't a clean way to accept them as sf
objects or something else right now, so you'd need to use the data.frame
method if you want to use points.
Because the lake polygons are much more complex than the coarser grid cells (I think these climate data are somewhere around 0.25° and 0.35° lat/lon cells), we stopped trying to sample the climate data to the lake polygon, and instead simply started using the value of the grid cell that the lake centroid landed in. This is what is currently done with NLDAS. Using the centroid simplifies a lot of things and allows us to re-use the data from one cell for the multiple lakes that fall within it, as opposed to calculating each lake polygon's specific driver values in case there is a partial overlap with an adjacent cell.
For NLDAS, some of this is discussed and shown here
With that in mind, your comment:
An alternative, simpler approach that would be akin to using lake centroids would be to map the lake centroids to our constructed GCM grid, and then use the grid polygons that contain lake centroids in our geoknife query. This would be less accurate for lakes that span multiple grid cells, but that may be an acceptable tradeoff, particularly if the area-weighting computation is expensive and/or doesn’t have much of an impact on the final computed set of drivers (not sure how much variation there is between a cell and its immediate neighbors).
I'd say "yes" to this, the tradeoff is acceptable and also matches what we do elsewhere.
For your steps above, I'd recommend
Construct GCM grid (everything below this assumes we have this info) I think this is going to be trickier than NLDAS because the grid isn't a lat/lon grid already, but should be possible and might want to talk to David (who reformatted this original dataset for the GDP hosting) or Blodgett
Construct a large "job" or "task" grid that contains many GCM cells. The "job" grid will be the size of a geoknife query, and I'd suggest creating this as a number of simple tiles across the whole US (so some tiles will be empty and have no lakes, but these tile targets will therefore not have to change in the future if the footprint expands). These job tiles will also be the unit of bulk data that gets added to the final netcdf file after the geoknife jobs are complete, so its arrangement should support bulk writing of netcdf, but we can talk about that later and you should ignore this part of the comment for the early prototype tasks.
~Make bounding boxes for every single lake~ use centroids for every lake
Filter GCM grid to only cells that intersect with ~lake bboxes~ lake centroids
Use geoknife to call for all centroid cells, breaking the jobs up into the "tiles" in 2 above
Create a combiner target that is a single NetCDF file where all of these data get written to it
☝️ The idea with the "tiles" is that even when we expand the number of lakes we're modeling, you may not be adding a centroid that covers a new GCM cell, so the final .nc file would not rebuild in that case. But, if you are expanding the footprint or hit new cells within existing tiles, you'd only rebuild the tile-specific geoknife targets, followed by the final NetCDF file (which would need to be rebuilt anytime one of the tile targets changes), avoiding excessively greedy rebuilds in this case.
That issue with points vs polygons is something that I was struggling with when getting the query to work. It must have been a different part of my command. Although, I just investigated this more and I think it is more the conversion of my sf
object to sp
then to simplegeom()
. It appears that simplegeom()
doesn't like SpatialPoints? I think maybe I need to just make it into a data.frame as in the sample simplegeom(data.frame('point1'=c(-89, 46), 'point2'=c(-88.6, 45.2)))
.
library(sf)
library(geoknife)
centroids_sf <- structure(list(site_id = c("nhdhr_121623196", "nhdhr_143247099",
"nhdhr_151949134", "nhdhr_157359786", "nhdhr_74926939"), Shape = structure(list(
structure(c(-88.4467147238926, 42.7996962344843), class = c("XY",
"POINT", "sfg")), structure(c(-88.6432114598417, 43.6283635061587
), class = c("XY", "POINT", "sfg")), structure(c(-91.7772962678006,
45.6126497556779), class = c("XY", "POINT", "sfg")), structure(c(-90.290648760424,
44.2691685292945), class = c("XY", "POINT", "sfg")), structure(c(-91.5175899000678,
45.7450307015114), class = c("XY", "POINT", "sfg"))), class = c("sfc_POINT",
"sfc"), precision = 0, bbox = structure(c(xmin = -91.7772962678006,
ymin = 42.7996962344843, xmax = -88.4467147238926, ymax = 45.7450307015114
), class = "bbox"), crs = structure(list(input = "EPSG:4326",
wkt = "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"Horizontal component of 3D system.\"],\n AREA[\"World.\"],\n BBOX[-90,-180,90,180]],\n ID[\"EPSG\",4326]]"), class = "crs"), n_empty = 0L)), row.names = c(NA,
-5L), sf_column = "Shape", agr = structure(c(site_id = NA_integer_), .Label = c("constant",
"aggregate", "identity"), class = "factor"), class = c("sf",
"tbl_df", "tbl", "data.frame"))
centroids_geoknifegeom <- simplegeom(as_Spatial(centroids_sf))
Error:
Error in as(.Object, "simplegeom") :
no method or default for coercing “SpatialPointsDataFrame” to “simplegeom”
Thanks for the advice on direction! I do have questions on the very last part. I am not understanding how
these tile targets will therefore not have to change in the future if the footprint expands
and
if you are expanding the footprint or hit new cells within existing tiles, you'd only rebuild the tile-specific geoknife targets
can both be true. Wouldn't the tile data already have built? Or is it that the target is prepped for doing the job, but we don't actually try to download if there is not a centroid in the tile.
Replying separately to these two comments
It appears that simplegeom() doesn't like SpatialPoints
Correct. geoknife
does not have a method for spatialPoints. You need to use the data.frame method currently. Behind the scenes, GDP doesn't actually support points as the "feature" for queries, but geoknife
is getting around that by creating a tiny diamond-shaped polygon around the point. See here
If we do future def on geoknife
, we should probably support sf::st_sf
classes, including multi-point collections with named features. But unlikely that will happen soon.
For now I think you know the way around this by creating lat/lon data.frames with names.
and
if you are expanding the footprint or hit new cells within existing tiles, you'd only rebuild the tile-specific geoknife targets
can both be true. Wouldn't the tile data already have built? Or is it that the target is prepped for doing the job, but we don't actually try to download if there is not a centroid in the tile.
Yes, I think you probably wouldn't branch a target for a given tile if it is empty. But perhaps my underlying point is that you don't want the tiles to adjust size or domain when you add a new lake. If they did rely on the domain (e.g., you laid out the tiles based on the bounding box of all lakes used, instead of fixed according to the extent of CONUS) you'd rebuild all tiles every time the footprint of all lakes changed, even if by a tiny amount or because the sf
object's metadata itself changed.
@jread-usgs, re: laying out the tiles according to the extent of CONUS, the Dynamical Downscaling for the Midwest and Great Lakes Basin GCM data covers only the northeastern U.S. and a portion of the midwest. Here's our current working reconstructed grid of the GCM data:
Do we still want our 'tile' layout to cover all of CONUS, in case we switch driver data sources at some point? Or should we only be sure to cover the full extent of this GCM dataset?
Nice. Yes, I see that we really wouldn't need to have the tiles extend beyond the footprint of the dataset. I think I had a more general tile concept in mind that could apply to different underlying datasets while being able to use the target branching concepts. But that is probably unnecessary in this case.
For this grid, was it simpler than I thought to create the grid? I thought there was some funkiness because there is both a jx
dimension to the data and a longitude value. It made me think the lat/lon spacing wasn't uniform.
@jread-usgs - no there was/is some funkiness to puzzle out. The iy
and jx
dimensions (both 1D arrays) represent the projected y and x coordinates, respectively, in kilometers. Those values (if multiplied by 1000 to get to units of meters) map out a square 25km x 25km cell-size grid in a custom lambert conformal conic projection (per the projection parameters included in the netcdf metadata). This matches the documentation, which indicates the grid has a 25km cell size. Because the grid was created to be square in this projected coordinate system (a square grid in a conical projection), the latitude and longitude vary along the lines of the drawn grid:
This means that if you reproject and map this same grid in an unprojected geographic coordinate system (e.g. WGS84), the center of the grid reaches higher latitude than the edges, and the grid appears curved:
With this sleuthing, I'm pretty sure that the xlat
and xlon
variables in the netcdf (both 2D arrays, with dimensions equal to the length of iy
and xj
- 86 x 111) are meant to be the unprojected decimal degree equivalent of the cross points in projected coordinate space, since they are in decimal degrees and do vary across a given row/column. The only remaining odd point is that if I build the grid cell using the xlat
and xlon
values (NOT a square grid, but curved) and define the coordinate system as WGS84, the resulting grid (in green) matches the shape and curvature of the online preview given for the dataset (the default for open layers is web mercator) but not the grid I built with the iy
and xj
values and the projection information, reprojected to WGS84 (in blue):
And if I reproject that WGS84 grid built with the xlat
and xlon
values to the custom lambert conformal conic projection defined in the metadata, it (in green) doesn't line up with the square grid constructed using the iy
and jx
values (in blue):
Therefore, I'm a bit unsure as to how they pulled those xlat
and xlon
values, and will see if David has any insight.
So for now, to reconstruct the grid, I am using the iy
and jx
values to build cell polygons, with the crs defined per the metadata in the netcdf. I think this is the most robust approach, and it results in a square grid with the correct cell size. I've got this in a jupyter notebook for now, and will run it by David to confirm it all looks consistent with his understanding of how the data is stored. If it does, I'll put it in script format, and we can call it from our targets pipeline with reticulate. If not, I'll rework it with David's advice.
oh, wow. Thanks @hcorson-dosch - I figured there was something a little different about this dataset. Seems having the grid in the native dataset CRS is best, and then projecting the lake centroids into that to calculate the match-ups. Verifying with David/Dave on this is :+1:
Should we set this up to munge to 6 NetCDF files: one for each GCM that includes the 3 time periods (1980-1999, 2040-2059, 2080-2099)? Curious what sort of organization would be most helpful for https://github.com/USGS-R/lake-temperature-process-models
Lindsay and I chatted, and we decided to go with 1 netCDF per GCM, for now. Lindsay's plan is to save the timeseries data associated with each time period as 'chunks' within the netCDF, which will make it easier to pull out the full timeseries for each variable for each time period in lake-temperature-process-models
Moving further NetCDF formatting discussions over to #252
Related issues that stemmed from this one #236, #237, #244, #252, #254, #258, #264, #265, #273. Will close this one once we join the current branch gcm_driver_data_munge_pipeline
with main
.
Just revisiting this since we've decided to store only the WGS84 decimal degree coordinates as lat/lon values in the intermediary GCM netCDF files, rather than including the projected coordinates and a grid mapping variable (see #252).
I wanted to check back in on some of the earlier funkiness I'd noticed, since in later iterations of this work (before we switched to rebuilding the grid in R) I had realized that the package I had been using to define the projection of the grid in Python (pyproj
) generated a proj4 string that then wasn't recognized correctly by R when I read in the exported shapefile, which led me to switch to using the cartopy
package proj4 string instead. When I made that switch, it also fixed the two issues I'd seen above, re: alignment of the grids built either from the projected coordinates or the DD coordinates.
Previously:
The only remaining odd point is that if I build the grid cell using the
xlat
andxlon
values (NOT a square grid, but curved) and define the coordinate system as WGS84, the resulting grid (in green) matches the shape and curvature of the online preview given for the dataset (the default for open layers is web mercator) but not the grid I built with theiy
andxj
values and the projection information, reprojected to WGS84 (in blue):
But with the use of the cartopy
proj4 string when defining the grid built with the projected coordinates, that grid, if reprojected (in blue), does align (not perfectly, but well) with the grid built using the WGS84 xlat
and xlon
values (in green):
And again, previously:
And if I reproject that WGS84 grid built with the
xlat
andxlon
values to the custom lambert conformal conic projection defined in the metadata, it (in green) doesn't line up with the square grid constructed using theiy
andjx
values (in blue):
But now, when I used the same cartopy
proj4 string to define the projection of the grid built fromt he projected iy
and jx
coordinates (in blue), the grid built using the DD xlat
and xlon
values, if reprojected from WGS84 to the grid crs (in green), aligns well, if not perfectly:
So I think we can now be confident that the xlat
and xlon
values are the unprojected decimal degree equivalent of the cross points in projected coordinate space, iy
and jx
Create targets pipeline to munge GCM driver data from GDP using
geoknife
to a single (or few) NetCDF file(s). It should work with2_crosswalk_munge/out/centroid_lakes_sf.rds.ind
as the input for the spatial geom & download data for all 6 of the GCMs (map over these using data.frame of URLs & time periods). We will create a wrapper function that builds thistargets
pipeline and creates the single NetCDF file. The function will be called from the bigscipiper
pipeline.Plan is to pilot this workflow with 5 lakes, then scale to all of MN, then to the whole footprint.
Considerations: