SimonFisher92 / Scottish_Snow

2 stars 0 forks source link

Discussions #3

Closed EddieBoyle2019 closed 7 months ago

EddieBoyle2019 commented 10 months ago

Hello all,

Using the Issues feature for this GitHub repository is great for individual technical tasks, but I think it would also be useful to enable the Discussions feature for this GitHub repository so we can discuss and share ideas for development and collaboration more generally within our little ad-hoc group in a proper forum:

https://docs.github.com/en/discussions

I'll kick things off. Simon's README.md notes are a great technical overview of the areas we could work on.

I myself am interested (at least initially) in 1) Data Harvesting and 5) Communication.

I'm looking at using the new Copernicus Data Space Ecosytem for harvesting Sentinel data and the cloud-based JupyterLab service they are offering:

https://dataspace.copernicus.eu/news/2023-8-2-first-example-notebooks-available-jupyter-lab

I have also had similar thoughts to Simon about developing a web-based application to let others see any functionality we build and potentially make it interactive. I was looking at building a GeoDjango application hosted on my own personal AWS Ubuntu/Apache virtual machine, but it would be great obviously to have a publicly accessible webhosting platform that didn't cost money and wasn't tied to any one individual (and also allowed server-side executables, scripting and data storage, unlike GitHub Pages). In the past I have found this a very difficult thing to achieve in previous projects! Simon suggests Streamlit for this, which is new to me but I will certainly look at.

Eddie Boyle

SimonFisher92 commented 10 months ago

Hi Eddie

Thanks for the comments, I agree that this is a good place to discuss issues and thoughts.

I have been thinking about the problem with scene classification being at 20m, and ultimately, snow patch analysis needing to be at the 10m resolution.

We have three options in my opinion:

1) build our own segmentation approach, naively, on the 10m. This will involve annotation, heavy lifting by GPUs and another chunk of work.

2) use image analysis (not ML) to segment. I did this previously by using the differences in reflection wavelength between snow and clouds to accurately distinguish. There are no other white elements in Scottish mountains at that elevation (only clouds and snow). Image analysis is actually enough, though a bit less refined.

These wavelengths will allow snow to be distinguished from clouds (after huge trial and error)

preset=CUSTOM&layers=B01,B04,B11&maxcc=90&gain=1.0&gamma=1.0

snow exists in this range (clouds do not):

hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) lower_range = np.array([20, 30, 0],np.uint8) upper_range = np.array([90, 255,255],np.uint8) mask = cv2.inRange(hsv, lower_range, upper_range)

(https://github.com/SimonF92/Scottish_Snow/blob/master/Bluebird.py)

3) Use the 20m scene classifications with the 10m data, interpolate the 20m segmentation masks down to 10m. I'm not sure about this, I don't think it will work- but its still a thought.

murraycutforth commented 10 months ago

Hi @EddieBoyle2019! Nice to e-meet you. Don't know if Simon mentioned but we used to work together, and I'm now based in the US, with a month or so before I start a new job. These satellite images are the closest I can get to the highlands now..

Thanks for setting all this up @SimonFisher92, the readme notes are a great outline.

I've been having a play with the sentinel data over the last few weeks, and in summary I'm currently at the point where I've downloaded the scene classification (SCL) masks for a 100km^2 tile centred over the Cairngorms, and can measure the snow extent using the masks provided automatically at 20m resolution. I'll start a separate issue to discuss these masks and show some examples since I have my doubts about the quality, initially looks like they might not be good enough.

  1. Data Harvesting

I'll get my existing code merged into this repo- see what you think Eddie, but it might already do a good enough job for the data harvesting step. I've used the sentinelsat package to call the copernicus sentinel API to download images. The system is set up so that most of the older data products are "offline" and you have to first request a particular data product, and then wait roughly a few hours before it is available to download, and I've set up my code so that it will request the data products and then repeatedly check until they're available to download. It took me about 24 hours to get all the SCL masks for one tile covering the Cairngorms, for cloud cover < 50%, which totalled about 250 images. I only downloaded the SCL masks, and not any other data, because I'm working on an ancient laptop with no disk space.

  1. Coordinates

It would be great to get a list of useful coordinates on here to work with, would these correspond to individual patches? (edit- just seen the list of coords in the bluebird_coords file. Would we want to create a fixed size polygon around each coordinate to measure on?) With satellite data we can analyse data at any scale from regional down to hillslope level. I've created some very arbitrary polygons defining large regions like the northern cairngorms for my first pass. Ultimately, what results do we want to show from all this? Time series plots showing snow area for various individual patches / hills / regions?

  1. Segmentation

So I've been looking at the scene classification provided with the data. The snow classification seems to be based on something prettty similar to your bluebird method Simon - it's based on different band thresholds (described in fig 2 here: https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm-overview). Not convinced their masks are all that great though, I'll show examples in a new issue. I've come across the s2cloudless package (https://github.com/sentinel-hub/sentinel2-cloud-detector) which seems to be a more effective cloud segmentation method which we could use, and then after segmenting out clouds it should be pretty trivial to create our own snow segmentation on the remaining pixels using some thresholds?

I agree that labelling and training our own DL classifier is probably overkill.

Something else which we can use which will be relevant to segmentation and analysis, is a digital elevation model which we can probably get from something like this: https://pypi.org/project/elevation/. We can get the gradient of the land from this, which will probably be a really important feature in the classification of snow (since the sentinel images are always acquired at about the same time (11:30AM), the shadowing effects will always be the same for a given surface gradient.

  1. Data imputation

It's wildly optimistic, but my dream is to incorporate climate reanalysis data (such as ERA-5) into a physics-based snow melt model which is constrained by the sentinel observations, in order to impute all these missing values due to cloud cover. Are there any other simpler statistical models which we can use for imputation? I would like to experiment with some other approaches like a hidden markov model, or gaussian processes, mainly just for my own interest. Doubt they would end up telling us anything useful without coupling to weather data.

ipoole commented 10 months ago

Hi guys.

Just to introduce myself, I'm Ian Poole, I used to work with Simon and Murray at Canon Medical. Now retired :-). I'm hoping to get involved and eventually to contribute. I cloned the repository a while ago but have only just now started to get it running - downloading Copernicus data now. It's going to take me a while longer to understand the codebase, hope y'all can be patient.

Cheers, Ian

EddieBoyle2019 commented 10 months ago

Hello Simon, Murray, Ian,

It looks like I'm slightly the odd one out in our little group, I haven't worked with any of you before :-)

My background is in software engineering and web development in a few places over the years, but mostly at the University of Edinburgh. I've been an admin of the Scottish snowpatches Facebook group with Iain Cameron for many years, and have been hunting Scottish snowpatches for even longer!

Murray, can you enable the Discussions feature on this repo so we can use that, as I mentioned in my previous post? I think we might benefit from using this GitHub feature for future interactions.

I have been concentrating on the new Copernicus Data Space Ecosystem and the Sentinel Hub API (https://dataspace.copernicus.eu/analyse).

Prompted by Murray, I have also been using the Streamlit platform. I am new to Streamlit but I'm really impressed with it, it's clearly a platform we can use for public communication of our work.

I have developed a first Streamlit web app prototype which can be seen at: https://hhwfdg8awm9xekixpvadsy.streamlit.app/.

The code for this is complementary to the image analysis/segmentation/masking approach that Murray and Simon have been working on, and exists in my own GitHub repo at the moment (https://github.com/EddieBoyle2019/GeoML/).

This prototype is quite basic. At the moment it retrieves a single Sentinel 2 image with the location and date hard-coded, but you can see the potential for turning this into something more interactive and generic that anyone can use, and integrated in some fashion with the existing analysis code.

murraycutforth commented 10 months ago

Great prototype @EddieBoyle2019, just had a quick look.

I haven't yet used the SentinelHub API but it has the great advantage of allowing the download of sub-parts of images, whereas the sentinelsat package I have used up to now downloads the entire tile (10k x 10k image for 10m resolution) for a particular time and band.

I think only @SimonFisher92 can enable the discussions feature as the repo owner? There should be an option in the settings tab.

EddieBoyle2019 commented 10 months ago

Yes I meant Simon when I mentioned you in my previous message Murray, apologies :-)

SimonFisher92 commented 10 months ago

Hi everyone

Sorry for the delay, that should be discussions open now.

On a personal note, I am doing condensed hours from next week so i am hoping to have more time to be active here. I am going to start working on a way to "segment" snow patches out from images (just using image analysis). There is no reason why we shouldnt try both approaches though.

The geojsons should now be snowpatch specific, after Murray approved my PR. im not sure about the bounding box size though, if anyone wants to check it out.

The streamlit app looks great Eddie (I had to wake it up, which is a neat feature by them). I hosted a streamlit webapp on AWS one time, which might be useful if we want to have historic snow patch segmentations stored in the back end to make it snappy (no sentinel API use)