Develop roadmap - Githubissues

ebgoldstein commented 1 year ago

Basic s2m workflows have previously been discussed via emails and GH discussions (e.g., https://github.com/Doodleverse/segmentation_gym/discussions/62#discussioncomment-3658122)

We need to develop a basic roadmap, and ideally break the roadmap into concrete issues (actionable steps).

jmdelvecchio commented 1 year ago

Hello and thank you for looping me into this!

Re: "Pre-doodler" Script to read in GeoTiff, split into appropriate size files, and save coordinate info

I currently have a script that uses gdal and pathlib that takes Planet Visual Clips as they are delivered in the Planet Explorer and (1) merges them into a single large geotiff with gdalwarp (with a step to convert them all to the same crs because Planet will deliver you swaths with different crs if your AOI spans more than one UTM zone 🙄 ), (2) retiles them with gdal_retile.py for a given pixel dimensions and creates a tileIndex shapefile and .csv with filename and corner coordinates at the same time, and (3) makes jpegs from those geotiffs and sends them to the Doodler folder.

Both Planet and Google Earth Engine will deliver large geotiffs as broken-up files so I imagine the merging step is good to keep built in, just needs generalizing depending on scenario (workflow is virtually the same for both, though if you're in GEE you can specify tiling size in GEE. Can provide/work on those scripts too for the JavaScript code editor and Python API.)

Please direct me w.r.t. discussions vs pull requests etc. as I am new to collaborative repos, I usually just code for myself 😆

ebgoldstein commented 1 year ago

ok, I see a commit from @dbuscombe-usgs w/ a NAIP workflow... nice!

dbuscombe-usgs commented 1 year ago

This is just to download NAIP imagery using Google Earth Engine, so application of Gym models on that imagery is possible. Right now, it uses geemap functionality

My next commit relates to post-Gym workflows, and will be a specific GDAL-based workflow, that applies to situations where the jpegs have associated "worldfiles" (wld format). I typically use GDAL to create small tiles (using gdal_retile.py) from larger geoTIFF images, then I create jpegs and associated wld files that contain the positional info (using gdal_translate). I also generate a xml file that contains CRS info.

So, the workflow I developed and will soon share is along these lines ...

take the "predseg" jpeg label outputs from application of a Gym model, and copy them to a directory along with the wld and xml files
use gdal_translate to make geotiffs of the labels, then mosaic those geotiffs into a label image that is the same extent as the original orthomosaic. This step is helped by the creation of a VRT

I often use gdal_retile to create imagery with overlap. It helps to oversample so we have facility to ensemble model outputs. In this case, I have found it best to work with the softmax scores rather than the argmax labels

So, I also have a workflows that:

take the softmax scores and write them as tifs, then copy the wld and xml files from the jpegs
then I use a trick; I rename the wld and xml files, replacing 'jpeg' with 'tif', then I use gdal_translate to stitch the softmax scores directly into tifs, then I create a VRT, then finally I use the resampling options provided by gdal_translate to deal with averaging in the overlapping regions. 'gauss' and 'lanczos' resampling seem to work well, but other options include 'average' (arithmetic mean)

Just need to tidy and doc my codes, and I will upload them with a minimal working example..... very soon!

jmdelvecchio commented 1 year ago

Awesome! @dbuscombe-usgs I haven't yet used geemap to download any imagery or raster data from GEE, only tables - does it use a download URL to download locally, and does it tile up large orthomosaics automatically when you download them, as it does in the code editor?

I am furiously working on a Sentinel-2 workflow right now and I'm cheating and using the code editor Export.image.toDrive to tile up my GeoTIFFs. I was about to write a script that matches the original tiffs to the gen_images_and_labels.py of Doodler for the grayscale jpegs I've segmented (I want to be able to put my Doodles in space too - currently they are a much more accurate respresentation of where water tracks are compared to the model 😭 )

dbuscombe-usgs commented 1 year ago

I use the geemap package that is a wrapper to the ee package (from Google). I've never used the GEE code editor, but I'm aware of what it is. My only use of GEE (for now) is to access imagery and I only know python. With geemap, it is a simple matter of pointing to the collection you want, providing a datetime, and a ROI, and then it downloads the imagery. The complicated part is to get around the filesize limit, so I adopt a workflow of downloading small tiles, then I use GDAL to stitch them together again

I need to update the script that I shared this morning to do the stitching part. I've been migrating away from the GDAL cmd utilities to the osgeo.gdal python API

@jmdelvecchio you are at the bleeding edge!! so stoked you are giving this a try and you are helping lots. Next year I'll be working on a new thing that will allow doodling directly on maps and geospatial imagery. For now, we should think about how to get the doodles into real-world coordinates.... I'm willing to put some time in - let's come up with a plan!

ebgoldstein commented 1 year ago

i think we should have just a little zoom about all of this... brainstorming... and then break the tasks into issues and assign them and make it all happen...

dbuscombe-usgs commented 1 year ago

I'm game

jmdelvecchio commented 1 year ago

I would be very happy to chat after November 15 (v busy before then)! Just keep me in the loop.

dbuscombe-usgs commented 1 year ago

I made a simple script that works well for segmenting orthomosaics using existing Zoo models, fyi: https://github.com/Doodleverse/segmentation_zoo/blob/main/scripts/segment_orthomosaic.py

dbuscombe-usgs commented 1 year ago

@jmdelvecchio and @ebgoldstein it would be cool to meet up again within the next few weeks to discuss progress and directions. I feel like we have a few alternate workflows now for mapping, and wonder to what extent we can compare/consolidate/improve

@jmdelvecchio are you going to CSDMS?

jmdelvecchio commented 1 year ago

Sounds good. I'm pausing new development on my end as I (1) teach until the end of May (which is also why I won't be able to go to CSDMS, damn quarter system) and (2) wait to hear from NSF whether my work will be funded 😆 so perfect time to coordinate.

jmdelvecchio commented 1 year ago

Ok, done teaching until January so now to catch up on coding - there weren't any major developments in automated image segmentation while I was gone, were there?.................

Doodleverse / s2m_engine

Develop roadmap #3