Closed dbuscombe-usgs closed 1 year ago
I'm testing this new file now and testing how long it takes to load a subset of transects from.
Here are the results of loading the file with and without using the geopandas.read_file bbox argument:
Time to load the file without bbox: 33.06240653991699 seconds
Memory usage without bbox: 150.27734375 MB
Time to load the file with bbox: 37.673070192337036 MB
Memory usage with bbox: 16.36328125 seconds
I'm going to try the same experiment with one of our current transect files.
I performed the same test with our largest transect file Gulf_USA_transects.geojson
which is 9.13MB
I got the following results:
Time without bbox: 4.226081609725952 seconds
Memory usage without bbox: 31.15234375 MB
Time with bbox: 3.266524076461792 seconds
Memory usage with bbox: 1.62109375 MD
Ok, seems like we should run with the single consolidated file?
I think we might want to split the geojson into a few smaller files of less than 3MB to make the loading times shorter.
I also tried a solution where I switched the file to a parquet format and with all the conversion included it took about 7-8 seconds to load, but it came at the cost of adding pyarrow as a dependency
My concern is that if the time to load transects on the map is too slow the user is going to think the map is broken.
I have split the transects into a 20x20 deg world grid. Those grid cells that have transect data are zipped up here. It's very hard to get uniform file size, because transects are of variable spatial density. The resulting files are therefore sometimes larger than 3MB. The only way around this is to make a smaller grid, like we did for shorelines.
We could adopt a similar strategy to shorelines here. We could store these files on zeodo or locally, and lookup which transects to load based on bounding box extent
each colour a different file
Thank you for making these new transect files. They load far faster compared to the other dataset. The largest file Transects_grid_region_23.geojson
takes about 8 seconds to load while an average file such as Transects_grid_region_15.geojson
takes roughly 3 seconds to load.
The current way we have the transects implemented is that they are distributed within coastseg. I think this is alright to keep for now, but later we should switch to the method we use to distribute shorelines. Being able to download transects from a URL will be helpful when users are contributing their own transects to the coastseg project.
Here is what I'll do
Thank you again for all your help and feedback.
Sounds good. We'll leave the transects inside coastseg for now, but let me know if you change your mind (it would be easy).
Will do I'll close this issue when I finish those tasks, then I'll make a new issue for hosting the transects
The shorelines have been added in pip install coastseg==0.0.67.dev2
.
transects.py
to load the transects from a geojson file with the bbox argument, so that its faster and takes less memory
I have cleaned and consolidated the CoastSeg transects, and added some for Australia. The resulting file is here:
CoastSeg_transects_clean.zip
It is only 13mb. Would you like me to split it up by region?
Please add the following references as transect sources:
Kilian Vos. (2023). Time-series of shoreline change along the Pacific Rim (v1.4) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7758183
Vos, Kilian, Wen, Deng, Harley, Mitchell D., Turner, Ian L., & Splinter, Kristen D. (2022). Beach-face slope dataset for Australia (Version 2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7272538