datacarpentry / semester-biology

Forkable teaching materials for course on working with data in R
http://datacarpentry.org/semester-biology
Other
76 stars 113 forks source link

Get data for spatial analysis lesson #438

Closed brymz closed 8 years ago

brymz commented 8 years ago

We are working to develop a spatial analysis lesson (#313). We have been involved with developing reading materials with NEON (http://neondataskills.org/tutorial-series/raster-data-series/ & http://neondataskills.org/tutorial-series/vector-data-series/) as a starting point for our lesson. The goal is to go a step beyond the exercises in the NEON material, which requires a bit of extra data as per the lesson plan and data requirements for the proposed spatial analysis exercise.

Lesson plan

ethanwhite commented 8 years ago

@sdtaylor or @MarconiS: As part of your role helping out with the labs training efforts, could one of you work with @brymz (a recent graduate from Morgan's lab who is working with me on the Data Carpentry semester long course) get the MODIS data he needs for this exercise in the form he needs it?

MarconiS commented 8 years ago

I will be more than happy to do it, of course! Eventually if Shawn is interested too, we can synchronize and show different ways of downloading and using MODIS data in R

sdtaylor commented 8 years ago

Yeah, piece of cake to get this. I'd recommend converting the modis data to tif and a common projection. Wrangling them out of the original hdf files can be a hassle. I've got the gdal commands to do it written down somewhere.

On Jul 27, 2016 2:56 PM, "MarconiS" notifications@github.com wrote:

I will be more than happy to do it, of course! Eventually if Shawn is interested too, we can synchronize and show different ways of downloading and using MODIS data in R

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/datacarpentry/semester-biology/issues/438#issuecomment-235684476, or mute the thread https://github.com/notifications/unsubscribe-auth/ALtozmLZXyhGJP0mJPXn40BokyNcrmdNks5qZ6m_gaJpZM4JSFXM .

brymz commented 8 years ago

Great, @sdtaylor & @MarconiS! I'm ready to draw up the lesson anytime. I just need the data to get started. Let me know if you need any clarification or run into a snag. Thanks for the help!

sdtaylor commented 8 years ago

The resolution of the neon rasters is 1 meter, so I think it'd be better to use landsat (30m) instead of modis (250m). But 250m modis pixels would still work if landsat is too cloudy. I will check out the landsat images to see if there is a cloud free year at these sites.

MarconiS commented 8 years ago

Great idea indeed Shawn! I rectify: it shouldn't even be a big deal; I am working on scripting it; if you want Shawn, we can coordinate to avoid doing the same thing twice

brymz commented 8 years ago

use landsat (30m) instead of modis (250m)

The vector / shapefile with the site plots will be set up arbitrarily (by you guys, I hope?!) to match the resolution of the satellite images. You could even pick around clouds if that became an issue. I'm thinking about 5 plots/site would work well.

Do you need the raster? or make the students directly download them during the lesson?

I would like to have a direct link to a zip of the data for this lesson. In previous lessons with 'complex' data download operations (e.g., http://www.datacarpentry.org/semester-biology/exercises/Loops-multiple-files-R/), we've also included links to the website database for students that are interested to checkout. So, both could be cool, but the zip of the whole data package, for sure.

Thanks!!

sdtaylor commented 8 years ago

I think I have everything needed using the modis data. Sergio is looking at the landsat data, so if it's not too cloudy it may be a better option.

zip file

contents: ./harvardNDVI/ - a years worth of bimonthly NDVI tifs for the harvard site ./sanJoaquinNDVI/ - years worth of bimonthly NDVI tifs for the sanJoaquin site ./create_dc_ndvi_data.R - script to make all the tifs from downloaded modis hdf's if your curious ./plotLocations/ - plots for the 2 sites made randomly in qgis.

brymz commented 8 years ago

This is great, @sdtaylor!

I'm working to set up the lesson solution and have run into a problem with the plot locations. It doesn't seem like they line up with the right units/range. I used readOGR() to pull in the shp.

> extent(chm_harv)
class       : Extent 
xmin        : 731453 
xmax        : 733150 
ymin        : 4712471 
ymax        : 4713838

> plots_harv@coords
     coords.x1 coords.x2
[1,] -72.17442  42.54048
[2,] -72.16620  42.53807
[3,] -72.16648  42.53402
[4,] -72.17259  42.53412
[5,] -72.17808  42.53421
brymz commented 8 years ago

Also, a minor thing. I would like to set all of the naming to the NEON site codes. harvard... -> HARV_ sanJoaquin... -> SJER_

brymz commented 8 years ago

It doesn't seem like they line up.

Brilliant, @sdtaylor! The reprojected data works great!! Novice mistake on my part.

brymz commented 8 years ago

./harvardNDVI/ - a years worth of bimonthly NDVI tifs for the harvard site ./sanJoaquinNDVI/ - years worth of bimonthly NDVI tifs for the sanJoaquin site

Could you tell me more about the two xml files in the NDVI directories? We might consider removing them if they are not important to save having to exclude them from the loop to extract NDVI values at the plots.

sdtaylor commented 8 years ago

You can delete the xml files. gdal seems to make them randomly, but nothing happens if you delete them.

Want me to change all the prefixes to the NEON codes and send another zip?

brymz commented 8 years ago

Want me to change all the prefixes to the NEON codes and send another zip?

That would be great!

sdtaylor commented 8 years ago

An interesting note on the extents. Sometimes the packages in R will take care of different CRS's for you. Like if you were to use raster::extract() with a raster and point file of different CRSs, it will do the extraction and spit out a note saying the CRS on one of them was changed. But raster::crop() will not do this automatically, and you have to explicitly make the CRS of both inputs to crop() the same.

brymz commented 8 years ago

Any idea why I am having trouble importing harvardPlots.shp if I move it into the root ('working') directory?

> plots_harv <- readOGR(".", "harvardPlots")
Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv,  : 
  Cannot open layer
brymz commented 8 years ago

Sometimes the packages in R will take care of different...

R developers often try to make sure those kinds of default options are present. We teach students to include those 'optional default' pieces to make sure there are no surprises behind the scenes. Thanks for the info.

sdtaylor commented 8 years ago

moving it into the working directory works for me. did you move all the harvardPlots files or just the .shp file?

brymz commented 8 years ago

Ok. I only moved the shp. That was my other question. Are all of those files necessary? The zip that we provide the students should not have any extra files.

sdtaylor commented 8 years ago

yep, all files are necessary. any shapefile you get from anywhere will have multiple files like this.

an alternative would be to provide a csv of the lat longs and import them that way, which would be a single file. ecology people are likely to see either scenario.

brymz commented 8 years ago

Let's keep the shapefile set for now, then. We can hold onto the csv option to simplify the exercise later on if need be.

sdtaylor commented 8 years ago

here are the materials with the fixed naming scheme.

sdtaylor commented 8 years ago

I looked at the NDVI time series, and the HARV site looks pretty funky in the spring. That's due to snow. The SJER site has a very nice ndvi curve.

brymz commented 8 years ago

It looks like real data! It will be a good talking point for students. Thanks for putting it together.

I'll put out a pull request shortly with a draft of the lesson. Feel free to comment on exercise content and the solution.

brymz commented 8 years ago

Can you give me a brief statement of where / how you got the MODIS data? I'm thinking something along the lines of: "The data from MODIS was downloaded from an online repository from _some_govtagency data portal (_link_tosite)."

sdtaylor commented 8 years ago

Short: The data from MODIS was downloaded from an online repository maintained by the USGS and NASA. It's freely available to anyone. https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table

Long: MODIS Products Table - This is all the different modis data you can download. They are all at different temporal spatial scales. This data was from MOD13Q1.

Click on a product and it will go to a page with some metadata about it. Everything is downloaded via a giant file server at http://e4ftl01.cr.usgs.gov/MOLT/. The products are again listed on the file server, and inside the folder for each one are the dates available for that product. Something that is only released once a year, like landcover, and 1 folder per year. MOD13Q1 is released every 16 days, so has 23 dates for every year.

Inside each date folder are the tiles for the entire earth. The hdf files are the actual data files, and there are also jpg files for a quick preview of them. I've never downloaded the xml files. As of July 2016 downloading from this server requires signing up for a free account.

Which tile should you download for your particular spot? MODIS data is split into a grid system over the entire earth. Check out the map here. I can usually eyeball where things are, but sometimes I'm off and download the wrong ones. For this data HARV is in tile h12v04, SJER is in h08v05.

The raw hdf files actually contain several layers in them, which are described on the respective product page under the Layers tab.