SatelliteShorelines / CoastSeg

An interactive toolbox for downloading satellite imagery, applying image segmentation models, mapping shoreline positions and more. The mapping extension for CoastSat and Zoo.
https://satelliteshorelines.github.io/CoastSeg/
GNU General Public License v3.0
46 stars 9 forks source link

New transects and references for the README #139

Closed dbuscombe-usgs closed 1 year ago

dbuscombe-usgs commented 1 year ago

I have cleaned and consolidated the CoastSeg transects, and added some for Australia. The resulting file is here:

CoastSeg_transects_clean.zip

It is only 13mb. Would you like me to split it up by region?

Please add the following references as transect sources:

2320sharon commented 1 year ago

I'm testing this new file now and testing how long it takes to load a subset of transects from.

2320sharon commented 1 year ago

Here are the results of loading the file with and without using the geopandas.read_file bbox argument:

Time to load the file without bbox:  33.06240653991699 seconds
Memory usage without bbox:  150.27734375 MB
Time to load the file with bbox:  37.673070192337036 MB
Memory usage with bbox:  16.36328125 seconds

I'm going to try the same experiment with one of our current transect files.

2320sharon commented 1 year ago

I performed the same test with our largest transect file Gulf_USA_transects.geojson which is 9.13MB I got the following results:

Time without bbox:  4.226081609725952 seconds
Memory usage without bbox:  31.15234375 MB
Time with bbox:  3.266524076461792 seconds
Memory usage with bbox:  1.62109375 MD
dbuscombe-usgs commented 1 year ago

Ok, seems like we should run with the single consolidated file?

2320sharon commented 1 year ago

I think we might want to split the geojson into a few smaller files of less than 3MB to make the loading times shorter.

I also tried a solution where I switched the file to a parquet format and with all the conversion included it took about 7-8 seconds to load, but it came at the cost of adding pyarrow as a dependency

2320sharon commented 1 year ago

My concern is that if the time to load transects on the map is too slow the user is going to think the map is broken.

dbuscombe-usgs commented 1 year ago

I have split the transects into a 20x20 deg world grid. Those grid cells that have transect data are zipped up here. It's very hard to get uniform file size, because transects are of variable spatial density. The resulting files are therefore sometimes larger than 3MB. The only way around this is to make a smaller grid, like we did for shorelines.

We could adopt a similar strategy to shorelines here. We could store these files on zeodo or locally, and lookup which transects to load based on bounding box extent

Transects_grid_regions.zip

dbuscombe-usgs commented 1 year ago

image

each colour a different file

2320sharon commented 1 year ago

Thank you for making these new transect files. They load far faster compared to the other dataset. The largest file Transects_grid_region_23.geojson takes about 8 seconds to load while an average file such as Transects_grid_region_15.geojson takes roughly 3 seconds to load.

The current way we have the transects implemented is that they are distributed within coastseg. I think this is alright to keep for now, but later we should switch to the method we use to distribute shorelines. Being able to download transects from a URL will be helpful when users are contributing their own transects to the coastseg project.

Here is what I'll do

Thank you again for all your help and feedback.

dbuscombe-usgs commented 1 year ago

Sounds good. We'll leave the transects inside coastseg for now, but let me know if you change your mind (it would be easy).

2320sharon commented 1 year ago

Will do I'll close this issue when I finish those tasks, then I'll make a new issue for hosting the transects

2320sharon commented 1 year ago

The shorelines have been added in pip install coastseg==0.0.67.dev2.

  1. I replaced the old transects with new ones
  2. updated the transects_bounding_boxes.csv file as well
  3. Updated transects.py to load the transects from a geojson file with the bbox argument, so that its faster and takes less memory