Closed jamesdamillington closed 2 years ago
@jamesdamillington thanks for logging the notebook idea. The outline and suggested datasets look great for me. Some comments as follows:
Overview of raster data characteristics
what do you mean by raster characteristics? size, number of bands
Reading raster data
Try rioxarray
Plotting categorical raster maps Analysing aggregate change (through bars charts and similar visualisation)
Try Holoviz visualisation toolkits. Most of the existing notebooks in the EnvDS book use them for interactive plotting.
Analysing zonal change (using ancillary vector data)
Use zonal_crosstab
from Xarray-Spatial
. See here an example in Microsoft's Planetary computer.
Analysing pixel-by-pixel change (including use of sankey diagrams)
Not sure which library is the most optimal, but holoviews seems to support interactive sankey diagrams.
All datasets are great and considerations are very relevant. My suggestion is to consider coarser datasets such as MODIS MCD12Q1. The MODIS tiles h17v03 and h18v03 cover the whole UK. The average size of each tile is 6MB per year.
Following the MODIS wildfire notebook which fetches the dataset from NASAβs Earth Data site, find below how I suggest downloading the MODIS land cover dataset for a single year and tile. We'll need to list file names for the remaining years and tiles. You can merge tiles using the merge
function in rioxarray
(see here).
notebook_folder = './general-exploration-landcover_modis'
if not os.path.exists(notebook_folder):
os.makedirs(notebook_folder)
fnames = ['MCD12Q1.A2017001.h17v03.006.2019196134714.hdf', 'MCD12Q1.A2018001.h17v03.006.2019199221720.hdf']
for fname in fnames:
if not os.path.isfile(os.path.join(notebook_folder, fname)) or os.path.getsize(os.path.join(notebook_folder, fname)) == 0:
username = 'XXX' #replace for your EarthData username if run local or in Binder
password = 'XXX' #replace for your EarthData username if run local or in Binder
fsspec.config.conf['https'] = dict(client_kwargs={'auth': aiohttp.BasicAuth(username, password)})
url = f'https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/6/MCD12Q1/{fname[9:13]}/001/{fname}'
filename = url.split('/')[-1]
with fsspec.open(url) as f:
with Path(os.path.join(notebook_folder, filename)).open('wb') as handle:
data = f.read()
try:
data.decode('utf-8')
raise RuntimeError('Could not download MODIS data! Have you authorized LAADS Web in your Eathdata account above?')
except UnicodeDecodeError:
handle.write(data)
# open a single file
modis_hdf = rioxr.open_rasterio(os.path.join(notebook_folder,fnames[0]))
modis_hdf.LC_Type1.plot()
plt.show()
An alternative to fetch datasets is via STAC (see this example). I've explored the catalog of the Planetary computer, but they don't have the MODIS MCD12Q1 product.
My suggestion is to take the route of MODIS or another product your feel comfortable with and follow the steps in the submission guidelines section.
As part the submission process, once you use the notebook template in your personal repository, please indicate in the comment box the URL to the repo. Thank you!
ps. apologies for the late reply, but I was participating in some project meetings last week. FYI, I've updated notebook templates and submission guidelines. Contributions to improve them would be welcome. I hope you enjoy the submission process and in general the community-driven aspects of the project (:
@jamesdamillington I've also found how to fetch lc/lu datasets from stac
. Find below a code snippet to fetch ~Esri 10-Meter Land Cover (10-class)~ 10m Annual Land Use Land Cover (9-class) over London using a target resolution of 500 m (it takes longer to fetch the native resolution of 10 m). The example is a combination of two existing notebooks, ODC.stac and Microsoft Planetary. The multi-temporal ~Esri 10-Meter Land Cover (10-class)~ 10m Annual Land Use Land Cover (9-class) seems to suit your notebook idea. Feel free to reuse the code for your contribution. Note you should install pystac_client
and odc-stac
.
## Example multitemporal land use (london)
from pystac_client import Client
import geopandas as gpd
import matplotlib.pyplot as plt
from odc.stac import stac_load
import rasterio
from pystac.extensions.item_assets import ItemAssetsExtension
import numpy as np
from matplotlib.colors import ListedColormap
import pandas as pd
km2deg = 1.0 / 111
x, y = (-0.118092, 51.509865) # Center point of a query
r = 100 * km2deg
bbox = (x - r, y - r, x + r, y + r)
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
query = catalog.search(
collections=["io-lulc-9-class"],
limit=100,
bbox=bbox
)
items = list(query.get_items())
print(f"Found: {len(items):d} datasets")
# Convert STAC items into a GeoJSON FeatureCollection
stac_json = query.get_all_items_as_dict()
gdf = gpd.GeoDataFrame.from_features(stac_json, "epsg:4326")
fig = gdf.plot(
"io:tile_id",
edgecolor="black",
categorical=True,
aspect="equal",
alpha=0.5,
figsize=(6, 12),
legend=True,
legend_kwds={"loc": "upper left", "frameon": False, "ncol": 1},
)
plt.show()
# Load with bounding box
r = 40 * km2deg
small_bbox = (x - r, y - r, x + r, y + r)
crs = "epsg:3857"
yy = stac_load(
items,
bands=("data"),
crs=crs,
resolution=500,
chunks={}, # <-- use Dask
groupby="start_datetime",
bbox=small_bbox,
)
merged = yy.compute()
_ = (
merged.isel(time=0)
.to_array("band")
.plot.imshow(
col="band",
size=4,
)
)
plt.show()
g = merged['data'].plot(col="time")
plt.show()
collection = catalog.get_collection("io-lulc-9-class")
ia = ItemAssetsExtension.ext(collection)
x = ia.item_assets["data"]
class_names = {x["summary"]: x["values"][0] for x in x.properties["file:values"]}
values_to_classes = {v: k for k, v in class_names.items()}
class_count = len(class_names)
with rasterio.open(items[0].assets["data"].href) as src:
colormap_def = src.colormap(1) # get metadata colormap for band 1
colormap = [
np.array(colormap_def[i]) / 255 for i in range(max(class_names.values()))
] # transform to matplotlib color format
cmap = ListedColormap(colormap)
vmin = 0
vmax = max(class_names.values())
p = merged.data.plot(
col="time",
cmap=cmap,
vmin=vmin,
vmax=vmax,
figsize=(16, 6),
)
ticks = np.linspace(0.5, 10.5, 11)
labels = [values_to_classes.get(i, "") for i in range(cmap.N)]
p.cbar.set_ticks(ticks, labels=labels)
p.cbar.set_label("Class")
Thanks for all these ideas @acocac - I'll explore these packages soon. Using these rather than ones I've used previously may slow completion of the chapter, but it's always good to learn new packages and to be consistent with other content in the book. I'll create a repo with the template soon and share the URL once that's done.
Chapter repo is here: https://github.com/jamesdamillington/landcover-exploration-nlcd
I am currently planning to use the C-CAP NLCD data: https://planetarycomputer.microsoft.com/dataset/noaa-c-cap The ESRI data for London is not appropriate as it is for a single year and some of my analyses require a time series.
@jamesdamillington no rush π thanks for sharing the repo too. Feel free to complete the notebook sections using the dependencies of the original notebook of landcover exploration. The initial ideas are only potential workarounds to fetch input data. We can have a look at the other suggested packages e.g. Xarray-Spatial
during the revision process.
Please add the dependencies in the environment.yml
file, then the GitHub action will assist in assessing how reproducible is the notebook (at least in linux OS).
I am currently planning to use the C-CAP NLCD data: https://planetarycomputer.microsoft.com/dataset/noaa-c-cap The ESRI data for London is not appropriate as it is for a single year and some of my analyses require a time series.
Sorry for the confusion, I didn't mean the ESRI data, instead the code snippet uses the multitemporal 10m Annual Land Use Land Cover (9-class) generated by Impact Observatory.
I am currently planning to use the C-CAP NLCD data: https://planetarycomputer.microsoft.com/dataset/noaa-c-cap The ESRI data for London is not appropriate as it is for a single year and some of my analyses require a time series.
Sorry for the confusion, I didn't mean the ESRI data, instead the code snippet uses the multitemporal 10m Annual Land Use Land Cover (9-class) generated by Impact Observatory.
Okay, great. I'll check it out.
I am currently planning to use the C-CAP NLCD data: https://planetarycomputer.microsoft.com/dataset/noaa-c-cap The ESRI data for London is not appropriate as it is for a single year and some of my analyses require a time series.
Sorry for the confusion, I didn't mean the ESRI data, instead the code snippet uses the multitemporal 10m Annual Land Use Land Cover (9-class) generated by Impact Observatory.
Okay, great. I'll check it out. But also, I'm now thinking that I'd like to work with MapBiomas data which should be accessible via Google Earth Engine API (hopefully can reduce resolution on server side)
Right, so I think the authentication needed to use the Google Earth Engine API is going to be overly restrictive. So I have used the code you suggested above and added to the initial notebook, plus updated environment.yml to include pystac_client
and odc.stac
(had some trouble installing these on Linux Mint OS due to cross-channel conflict from c libraries](https://stackoverflow.com/q/66914685) - need to force strict use of conda-forge).
Initial code seems to work!
Will the environment be automatically checked?
Once environment is checked I'll continue working with these data to integrate my existing code (can update to other suggested packages e.g. Xarray-Spatial
later).
Finally, should the repo/notebook name change if I use the ESA Sentinel data instead of the NOAA C-CAP data?
@jamesdamillington thanks for the update. Find below some comments.
Right, so I think the authentication needed to use the Google Earth Engine API is going to be overly restrictive.
Agreed, authentication to certain platforms such as GEE might be restrictive. Have you tried to access GEE STAC items using the odc.stac
library?
So I have used the code you suggested above and added to the initial notebook, plus updated environment.yml to include pystac_client and odc.stac (had some trouble installing these on Linux Mint OS due to cross-channel conflict from c libraries](https://stackoverflow.com/q/66914685) - need to force strict use of conda-forge).
The current template only checks if the notebook works in ubuntu linux OS. If you find useful we can report the conflict and workaround with Linux Mint OS in the README of the notebook repo.
Will the environment be automatically checked? The environment will be checked automatically at every push. You can monitor it in the Actions tab.
Once environment is checked I'll continue working with these data to integrate my existing code (can update to other suggested packages e.g. Xarray-Spatial later). Great! Feel free to add the suggested packages later.
Finally, should the repo/notebook name change if I use the ESA Sentinel data instead of the NOAA C-CAP data? We suggest to follow the file name conventions indicated in the submission guidelines:
config.json
file. Repo/Notebook updated: https://github.com/jamesdamillington/general-exploration-landcover
Linux Mint is built on ubuntu so the two should align. The environment issue is easily fixed is all packages are sourced from conda-forge (so need to set this default in the environment.yml, which I think I have now done).
So now, onwards with writing the notebook!
Hi @acocac - first draft is now complete for you to have a look at and feedback on!
@jamesdamillington the notebook looks great and well-structured! The GitHub actions confirms the proposed executable content is reproducible too, at least in linux OS (:
For the reviewing process, may I ask to transfer the repo to the Environmental-DS-Book organization? This will facilitate reviewers to preview the rendered version, in particular cell outputs with interactive plotting.
fyi, the main stages after the transfer are:
I hope the above stages are clear. I look forward to finding additional reviewers of your great notebook πΈ and starting the collaborative reviewing process π€
@acocac Sure, how do we transfer the repo to the EDS Book organization?
@jamesdamillington find some instructions here. In the organization name field you should type Environmental-DS-Book. Let me know if you have any questions!
Hi @acocac - I tried following instructions at that link to transfer the repo. I got the message:
You donβt have the permission to create public repositories on Environmental-DS-Book
@jamesdamillington apologies I forgot the key step of adding you to the organization π Can you accept the invite, and then try the transfer? Cheers
Right! Transfer in progress...
Great! I confirm the transfer is completed. I'll prepare the notebook for the reviewing process. I have some potential reviewers that I'll inform the final ones when we start the step of review round(s). Thanks for your contribution π
@jamesdamillington I'm delighted to mention we have started the reviewing process of your NBI. @annefou and @aedebus will kindly support the revision of the proposed technical and conceptual content of the contribution.
Hope you all have a great collaborative reviewing experience towards a common goal, Open Environmental Science for All π π
Thanks @acocac Do I need to do anything more right now, or just wait for reviews? For example, the submission guidelines state
A maintainer of the EnvDS book will assist you to add the notebook to a new branch in the main repo. After, a pull request will be created. In the PR, you will have to fill a form with a series of questions related to the contribution. Please complete them.
I can see this PR - do I need to merge it now or is that for you/reviewers to do?
@jamesdamillington thanks for the quick reply. Apologies for any confusion with the current submission documentation.
Thanks @acocac Do I need to do anything more right now, or just wait for reviews? For example, the submission guidelines state
You should wait for reviews. I'll update that particular section in the submission guideline as it is not longer required to fill such form (you already provided some context in the NBI). The form instead is filled by one of the EDS maintainers, see PR #110. Once we finish the post-print
stage, I (as current maintainer) will merge the PR in the main branch of the EDS book repo.
I can see this PR - do I need to merge it now or is that for you/reviewers to do?
You don't need to merge. It is one of the roles of the editor (again myself π , hopefully more volunteers joining in the future) to merge the PR to the main branch of the notebook repository. We will do it once you and reviewers agree a satisfatory completion of the review round.
Great, that's how I essentially understood things, but the text in the submission guidelines was a little confusing. :)
Nw (: the EDS is a work-in-progress community-driven project, we haven't reached version 1.0.0. We then really appreciate any feedback from contributors. You're welcome to suggest changes in the guidelines. We'll provide credits in the contribution types when I add you to the list of contributors.
Hi @acocac Yes, happy to help develop the submission guidelines, mainly by asking questions to help my understanding of the review process! (and then I can suggest text later).
For example, the Reviewing guidelines state that
the interaction of the authors [and reviewers?] is facilitated through ReviewNB
This would benefit (me at least!) from providing a little more guidance. ReviewNB looks great but I'm not entirely clear on how to use it within the process of deciding on changes (and editing the notebook). For example, I see that I can write responses to comments from reviewers in the text box - should I make a reply about a suggested edit in ReviewNB before making a notebook edit? Or just go ahead and make the notebook edit (see below) as I see fit and then reply? Who clicks the Resolve Conversation button? I assume that's for the reviewer to do once they are happy the comment has been appropriately addressed? How does that link to the editor's responsibility of approving PRs?
Then, the second major issue for me currently: how should I actually make edits to the notebook in response to the reviewer comments? I saw you have made some commits (editing file paths), so I pulled the repo - that has brought in the change that you made to the notebook file (now named general-exploration-landcover_io.ipyn) but not your suggested edits in the notebook itself. I then realised that's because I pulled from the main branch, but your edits within the notebook are on the _reviewround1 branch with an outstanding PR than needs to be merged to main.
As you noted above, you are editor so have responsibility for approving PRs. Do I just checkout the _reviewround1 branch, make edits, and submit PRs that you (as editor) then deal with merging into main? (What it a reviewer doesn't like my edit - which links back to my first set of questions).
Thanks!
Hi @acocac Yes, happy to help develop the submission guidelines, mainly by asking questions to help my understanding of the review process! (and then I can suggest text later).
Thanks for your great feeback and concerns in the current documentation. Let me address your questions below. I just opened the issue #115. You've flagged very important issues in the current documentation. I'll address them in a new PR, and then ask your revision or thoughts according to your availability π
the interaction of the authors [and reviewers?] is facilitated through ReviewNB
Good catch. I have to change it to authors and reviewers.
This would benefit (me at least!) from providing a little more guidance. ReviewNB looks great but I'm not entirely clear on how to use it within the process of deciding on changes (and editing the notebook). For example, I see that I can write responses to comments from reviewers in the text box - should I make a reply about a suggested edit in ReviewNB before making a notebook edit? Or just go ahead and make the notebook edit (see below) as I see fit and then reply? Who clicks the Resolve Conversation button? I assume that's for the reviewer to do once they are happy the comment has been appropriately addressed? How does that link to the editor's responsibility of approving PRs?
It's true the current instructions are vague and confusing π There is a plenty room for improvement in the proposed guidelines! A dedicated webpage in the EDS book with some visuals or demo might help to both authors and reviewers who aren't familiar with tools as ReviewNB. For authors, how to proceed will depend of the type of comments. For instance, if it's just a general question, you can reply directly. If it's a change which could improve the code or text my suggestion is to make the notebook edit and then reply. We haven't set a policy to resolve conversation button, but I think it's more editor's responsability. Again, we can add it to the improved version of guidelines βοΈ
Then, the second major issue for me currently: how should I actually make edits to the notebook in response to the reviewer comments? I saw you have made some commits (editing file paths), so I pulled the repo - that has brought in the change that you made to the notebook file (now named general-exploration-landcover_io.ipyn) but not your suggested edits in the notebook itself. I then realised that's because I pulled from the main branch, but your edits within the notebook are on the _reviewround1 branch with an outstanding PR than needs to be merged to main.
You should make edits in the _reviewround1 branch., this means you have to pull the PR branch and not the main. When we get substantial changes and approval from authors/reviewers, I as editor will close the PR and merge into main.
As you noted above, you are editor so have responsibility for approving PRs. Do I just checkout the _reviewround1 branch, make edits, and submit PRs that you (as editor) then deal with merging into main? (What it a reviewer doesn't like my edit - which links back to my first set of questions).
You must only make edits in the _reviewround1 branch. If the reviewer doesn't like the edit, authors can revert the commit. If the authors aren't sure how to do it, the editor can assist π§
@jamesdamillington please find the proof of the notebook and general actions in #110.
According to your response we expect to release it next Monday or the week after.
What is the notebook about?
This notebook introduces raster land cover data with simple manipulation and basic exploratory analysis techniques. The notebook will be based largely on the existing notebooks I have used for teaching and will examine:
There are now many data sources of classified (categorical) land cover data that are useful for Environmental Data Science. These include:
Considerations for deciding which of these sources to use in this notebook include:
Code and packages used in this notebook will initially be those used in the original teaching notebooks, notably:
ndarray
sIn time, the code can be changed to use packages from the pangeo ecosystem
Data Science Component
Checklist:
Additional information