SimonFisher92 / Scottish_Snow

2 stars 0 forks source link

Additional features for new download code #26

Closed murraycutforth closed 6 months ago

murraycutforth commented 7 months ago
murraycutforth commented 7 months ago

Okay @ipoole, @SimonFisher92, if you guys want to run the download script (renamed to src/download/main.py) I think it's working but I can't test it any longer as I've run out of requests for the month. The requirements.txt should be up to date and the only other thing you need to do is add a small local config file called "config.json" with your credentials inside:

{
  "client_id": "..",
  "client_secret": "..."
}
murraycutforth commented 7 months ago

If this works for you guys I'll start a PR to merge it into master.

murraycutforth commented 7 months ago

I've added code to get all 13 bands, the CLS mask, SNW and CLD probabilities, and pixel heights from a digital elevation map which conveniently is also accessible through the copernicus data hub!

The code stores the arrays as .npy files, and also outputs pngs for convenient browsing. Here is example data (beinn a bhuird, 2nd July 2018):

RGB image: S2A_MSIL1C_20180702T113321_N0206_R080_T30VVJ_20180702T132802 SAFE

digital elevation map: Beinn_a_Bhuird_dem

I've uploaded all data for beinn a bhuird from 2018-2022 here: https://www.dropbox.com/scl/fo/1a5axf6an3yh8g2npbml8/h?rlkey=7lokur69l91t3brkd4pt8jaqg&dl=0

However, now that I'm looking more closely it appears that the L2A data (the masks) doesn't line up with the L1C data (the raw band values), and I don't know why. Seems like this needs a bit more investigation..

murraycutforth commented 7 months ago

I've looked again, and I must have been mistaken as it does look like all the masks do correspond to the band data, but very occasionally there is no L2A mask for the L1C image- maybe their L2A processing fails occasionally?

Anyway all the data I've downloaded (all patches, 2018-2022) I've made available on dropbox here: https://www.dropbox.com/scl/fo/m31xb8eaz1su3ts77nfi5/h?rlkey=dttzxerq0bqoq4vk7isf776z4&dl=0

ipoole commented 7 months ago

Thanks Murray, I'm downloading the data from Dropbox now, one patch at a time... I'll use this data to make a start on a sklearn pipeline for pixel level classification. Let me know when you have a PR for me to look at, to get your code onto master. Ian.

SimonFisher92 commented 7 months ago

Super human effort Murray, that is simply wonderful, amazing progress all round from the team.

Ian, I will make you some ground truth (not much, just something to get you started). Is a ground truth mask like what I showed you before ok? I can do a decent amount of images later today (my PhD friends are graduating so it will be after that).

I know you're doing pixel level classification, so just checking you could mould a mask like what I showed you, to good use?

Once we're further on we can get the full dataset labelled properly by our volunteers

ipoole commented 7 months ago

I'm struggling to understand the downloaded data. How is the band index encoded in the filename? I'm aware of Sentinel file naming conventions (https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention) but that doesn't help. It suggests there can be a "Bxx" component to indicate the band, but I don't see that in these downloads.

I see that all the .npy files contain arrays of shape (146, 155, 3). Why ,3? I was expecting either all 13 bands in the one array, or one array for each band. I'm clearly being very dumb, sorry!

ipoole commented 7 months ago

Ok I just spoke to Simon. I was looking at only the L2a data. I see that in the L1C data the .npy files each contain 13 channels. So all good, sorry for being dumb.

SimonFisher92 commented 7 months ago

The DEM data is excellent, but i think slope direction is just as important as altitude, im going to add a function to the branch to incoporate directional data. All the snowpatches sit in norh east facing corries

SimonFisher92 commented 7 months ago

image

Pixel intensity of this is based on Murrays altitude dems, but colour now is based on slope direction. I would put money on long lasting patches being in bright purple/blue/red

SimonFisher92 commented 7 months ago

Small push to the branch (10-new_copernicus_api_trial) to add this functionality

ipoole commented 7 months ago

Very nice Simon! This will be very useful in a per-pixel classifier. Of course in a CNN setting it would (should) discover the gradient for its self. Just thinking how to encode the slope; could be gradient in each of x and y direction or angle and magnitude. I.e. Cartesian or polar coordinates. Btw, I take it all our patches are compass oriented? Y axis N-S?

SimonFisher92 commented 7 months ago

Yes, agreed re CNN- was thinking of this pure for pixel level. Good question about encoding. You could have 4 features, one for each NESW, presumably each pixel would be a combination of 2/4 of these values? Altitude could be another feature. If that were the approach, then i would definitely want to be sure of the model picking up relationships between these features. There might be another, more refined way to do it.

ipoole commented 7 months ago

I don't see the need for 4 features, 2 should be sufficient. In the Cartesian scheme you have gradient components in N-S and E-W, both of which can be +ve or -ve.

murraycutforth commented 7 months ago

image

Pixel intensity of this is based on Murrays altitude dems, but colour now is based on slope direction. I would put money on long lasting patches being in bright purple/blue/red

@SimonFisher92 this plot is a work of art! Agree with all the rest of what you guys have said about slope features.

murraycutforth commented 7 months ago

I would speculate that a NN would be better at interpreting the gradient given in Cartesian form (dx, dy) rather than polar form (phi, r) because the discontinuity in phi between 0 and 2pi could be difficult for a NN to learn?

murraycutforth commented 7 months ago

@SimonFisher92 @ipoole

Just an update on the download branch, I worked on a big refactor yesterday to use xarray objects, and have successfully tested this. I think it's a big improvement. We now have a single netCDF file storing an xarray dataset which contains everything (all bands, all classification info, plus elevation/slope) for a single year and a single patch. The best thing about xarray is that each dataset object is completely standalone, because the x/y/time coordinates of each grid point are given, as well as the names of each band. It's super easy to convert a band back to numpy arrays as well. I think xarray and netCDF is pretty industry standard for geospatial stuff. I know that climate modellers all share their results as netCDF files.

Here's an example of loading all the Creag Meagaidh data for 2020: image

Getting the raw numpy array for band two is as easy as: ds["B02"].values

Or you can get the RGB bands for the first time point like this: image

murraycutforth commented 7 months ago

Here's a little animation of An Riabhachan this year from those visualisations:

output

It's a subject for a fresh thread but I've noticed lots of false positives in the snow probabilities. I need to double check that the snow probability band is consistent the the snow mask in the SCL band though.

SimonFisher92 commented 6 months ago

Issue is now merged