Closed murraycutforth closed 6 months ago
Okay @ipoole, @SimonFisher92, if you guys want to run the download script (renamed to src/download/main.py) I think it's working but I can't test it any longer as I've run out of requests for the month. The requirements.txt should be up to date and the only other thing you need to do is add a small local config file called "config.json" with your credentials inside:
{
"client_id": "..",
"client_secret": "..."
}
If this works for you guys I'll start a PR to merge it into master.
I've added code to get all 13 bands, the CLS mask, SNW and CLD probabilities, and pixel heights from a digital elevation map which conveniently is also accessible through the copernicus data hub!
The code stores the arrays as .npy files, and also outputs pngs for convenient browsing. Here is example data (beinn a bhuird, 2nd July 2018):
RGB image:
digital elevation map:
I've uploaded all data for beinn a bhuird from 2018-2022 here: https://www.dropbox.com/scl/fo/1a5axf6an3yh8g2npbml8/h?rlkey=7lokur69l91t3brkd4pt8jaqg&dl=0
However, now that I'm looking more closely it appears that the L2A data (the masks) doesn't line up with the L1C data (the raw band values), and I don't know why. Seems like this needs a bit more investigation..
I've looked again, and I must have been mistaken as it does look like all the masks do correspond to the band data, but very occasionally there is no L2A mask for the L1C image- maybe their L2A processing fails occasionally?
Anyway all the data I've downloaded (all patches, 2018-2022) I've made available on dropbox here: https://www.dropbox.com/scl/fo/m31xb8eaz1su3ts77nfi5/h?rlkey=dttzxerq0bqoq4vk7isf776z4&dl=0
Thanks Murray, I'm downloading the data from Dropbox now, one patch at a time... I'll use this data to make a start on a sklearn pipeline for pixel level classification. Let me know when you have a PR for me to look at, to get your code onto master. Ian.
Super human effort Murray, that is simply wonderful, amazing progress all round from the team.
Ian, I will make you some ground truth (not much, just something to get you started). Is a ground truth mask like what I showed you before ok? I can do a decent amount of images later today (my PhD friends are graduating so it will be after that).
I know you're doing pixel level classification, so just checking you could mould a mask like what I showed you, to good use?
Once we're further on we can get the full dataset labelled properly by our volunteers
I'm struggling to understand the downloaded data. How is the band index encoded in the filename? I'm aware of Sentinel file naming conventions (https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention) but that doesn't help. It suggests there can be a "Bxx" component to indicate the band, but I don't see that in these downloads.
I see that all the .npy files contain arrays of shape (146, 155, 3). Why ,3? I was expecting either all 13 bands in the one array, or one array for each band. I'm clearly being very dumb, sorry!
Ok I just spoke to Simon. I was looking at only the L2a data. I see that in the L1C data the .npy files each contain 13 channels. So all good, sorry for being dumb.
The DEM data is excellent, but i think slope direction is just as important as altitude, im going to add a function to the branch to incoporate directional data. All the snowpatches sit in norh east facing corries
Pixel intensity of this is based on Murrays altitude dems, but colour now is based on slope direction. I would put money on long lasting patches being in bright purple/blue/red
Small push to the branch (10-new_copernicus_api_trial) to add this functionality
Very nice Simon! This will be very useful in a per-pixel classifier. Of course in a CNN setting it would (should) discover the gradient for its self. Just thinking how to encode the slope; could be gradient in each of x and y direction or angle and magnitude. I.e. Cartesian or polar coordinates. Btw, I take it all our patches are compass oriented? Y axis N-S?
Yes, agreed re CNN- was thinking of this pure for pixel level. Good question about encoding. You could have 4 features, one for each NESW, presumably each pixel would be a combination of 2/4 of these values? Altitude could be another feature. If that were the approach, then i would definitely want to be sure of the model picking up relationships between these features. There might be another, more refined way to do it.
I don't see the need for 4 features, 2 should be sufficient. In the Cartesian scheme you have gradient components in N-S and E-W, both of which can be +ve or -ve.
Pixel intensity of this is based on Murrays altitude dems, but colour now is based on slope direction. I would put money on long lasting patches being in bright purple/blue/red
@SimonFisher92 this plot is a work of art! Agree with all the rest of what you guys have said about slope features.
I would speculate that a NN would be better at interpreting the gradient given in Cartesian form (dx, dy) rather than polar form (phi, r) because the discontinuity in phi between 0 and 2pi could be difficult for a NN to learn?
@SimonFisher92 @ipoole
Just an update on the download branch, I worked on a big refactor yesterday to use xarray objects, and have successfully tested this. I think it's a big improvement. We now have a single netCDF file storing an xarray dataset which contains everything (all bands, all classification info, plus elevation/slope) for a single year and a single patch. The best thing about xarray is that each dataset object is completely standalone, because the x/y/time coordinates of each grid point are given, as well as the names of each band. It's super easy to convert a band back to numpy arrays as well. I think xarray and netCDF is pretty industry standard for geospatial stuff. I know that climate modellers all share their results as netCDF files.
Here's an example of loading all the Creag Meagaidh data for 2020:
Getting the raw numpy array for band two is as easy as: ds["B02"].values
Or you can get the RGB bands for the first time point like this:
Here's a little animation of An Riabhachan this year from those visualisations:
It's a subject for a fresh thread but I've noticed lots of false positives in the snow probabilities. I need to double check that the snow probability band is consistent the the snow mask in the SCL band though.
Issue is now merged