cropmapteam / Scotland-crop-map

This is the repository for the Scottish Government collaboration with EDINA and JNCC to produce a crop map for Scotland by developing machine learning algorithms applied to Sentinel satellite data
3 stars 0 forks source link

Write Neural Net code for crops mapping #39

Open quantoidb opened 5 years ago

quantoidb commented 5 years ago

opening new issue, so it doesn't get lost as part of issue #24

Here's an image :) of what I am looking for: instead of a mean/variance/range for each field, I need each field's image pixels info.

A field image is made of pixels in 3-channels (red, green, blue). So we need to generate a CSV file containing all fields with the following variables each: FID_ID, LCTYPE, LCGROUP, AREA, pixels 1 to 768 (assuming each image is 64x64 pixels).

from_IMAGE_to_CSV_file
geojamesc commented 5 years ago

Copying this comment from (now closed) #24.

Wrote a python script https://github.com/cropmapteam/Scotland-crop-map/commit/83e5490aed0486b4ecc799a3235e36169bf4b62d that calls the GDAL gdalwarp commandline tool to crop a raster image to a shapefile containing the field geometry giving an output image something like this:

60094947-52406400-9745-11e9-8450-6f9222bd4614

The cropped image has the same properties as the input image, i.e. has the 2 bands plus the georeferencing.

A zipfile with S1 data clipped to the Kelso Ground Truth field boundaries is available as S1_data_clipped_to_GT_Polys.zip from here:

https://uoe-my.sharepoint.com/:u:/g/personal/jcrone_ed_ac_uk/EWPZ7267rAxFkTd8rhcwfqwBnYHpvkaQYSWScE-C-PF8ww?e=mDIUrj

The S1_data_clipped_to_GT_Polys.zip has the following contents:

GTFieldPolys subfolder containing

Valid sub-folder contains 23201 S1 image clips. Images have filenames like:

S1B_20180922_30_asc_175817_175842_DV_Gamma-0_GB_OSGB_RCTK_SpkRL_9.tif

this is the original S1 image name plus a _GID suffix in this case _9 which identifies the GID of the field polygon in ground_truth_v5_2018_inspection_kelso_250619_c.shp which the S1 image has been clipped to.

NotValid sub-folder contains 11795 S1 image clips which validation deemed not to be valid as all S1 radar pixels were null, probably because the part of the image that the field boundary intersected with was an area of NoData in the image.

LUT.csv is a lookup table to map gid to ground truth lcgroup/lctype labels.

geojamesc commented 5 years ago

Having clipped the S1 data to a field it sounds like the resulting image then needs to be resized to a standard size and then the image dumped to a csv.

quantoidb commented 5 years ago

R code written. able to feed grayscale images into NN model. Need full Kelso dataset in grayscale and in separate folders by crop_type. Then we can compare NN results to Random Forest model.

quantoidb commented 5 years ago

NeuralNet model is working! :) Next step: write code to optimise the model.

Kelso_NeuralNet_v5
quantoidb commented 5 years ago

Got the Kelso greyscale images dataset from James yesterday.

Neural Net (CNN) code working fine. Overall classifier accuracy on Band-1 data is 40%, and on Band-2 data is 42%. Meaning, we'd be better off just making a guess. Will try other parameter settings, but they are not likely to help much, if at all. The reality is Kelso ~400 fields are just not enough not feed a Neural Net model.

Recommendations: 1 - get all of Scotland data 2 - create a set of JPG images, greyscale TIF are not good enough. 3 - crop images so that only crops data is visible; most images have a lot of white space, and it's not adding anything to the feature selection.

At this late stage we can forget about trying LSTM; there is probably not enough time to generate the needed dataset, and still design a model for it.

geojamesc commented 5 years ago

The images should already have been cropped so that only crops data is visible. The white space are nodata pixels that fell outside the extent of the field (as described in the field polygon dataset). Since the images are regular geometric square/rectangle shapes whereas the field extents will assume any manner of shapes and orientation, there will always be these regions of nodata.

quantoidb commented 5 years ago

I think we could minimise the no-data areas if we took the center of the field and only used 20-30% of the field around it. It's just an idea. Probably no time to fiddle with different types of extractions now. For project phase 2.

quantoidb commented 5 years ago

Terrible results all around. We might as well just take a guess.

These results are based on BAND 1 data (VV).

NN - Models Summary Results-Band-1.pdf

quantoidb commented 5 years ago

No better with BAND 2 data (VH).

NN - Models Summary Results-Band-2.pdf

quantoidb commented 5 years ago

@geojamesc Could you give me a count of images in each of the Train and Test crop category folders please. Just want to include in the paper for accuracy. Thank you!

geojamesc commented 5 years ago

This is band2 counts by label category. Not sure what's up with RGR. I`ll do same for band1.

{'Test': {'FALW': 246, 'PC': 163, 'PGRS': 2927, 'RASP_OPEN': 123, 'RGR': 122, 'SB': 1206, 'SO': 286, 'SPOT': 82, 'TGRS1': 82, 'TGRS2': 245, 'TGRS3': 183, 'TGRS4': 82, 'TGRS5': 184, 'UCAA': 82, 'WB': 430, 'WBS': 102, 'WDG': 143, 'WO': 388, 'WOSR': 492, 'WPOT': 123, 'WW': 1064}, 'Train': {'FALW': 428, 'PC': 204, 'PGRS': 4662, 'RASP_OPEN': 205, 'RGR': 122, 'SB': 1918, 'SO': 348, 'SPOT': 123, 'TGRS1': 163, 'TGRS2': 448, 'TGRS3': 265, 'TGRS4': 163, 'TGRS5': 225, 'UCAA': 102, 'WB': 634, 'WBS': 184, 'WDG': 265, 'WO': 551, 'WOSR': 737, 'WPOT': 224, 'WW': 1575}}

geojamesc commented 5 years ago

Actually of course, band1 is the same as band2 in terms of counts of image by category.