Closed fnattino closed 4 years ago
@fnattino Hi Francesco, I am thinking maybe you need some help with this? shall we break this down to some small steps and separate the task?
Hi @rogerkuou , indeed I wanted to discuss this with you. I think we can brainstorm and work on this together.
Hi Francesco, I have did some search for the following three satellites, below are my summary:
Data available in principle for commercial usage. According to this page one can submit the proposal to ESA to acquire data for a certain region for research usage.
In principle data available for downloading via web GUI, but ESA iser registration required. Did not fin existing solution for downloading in script, but i think in principle we can create some thing to request the data.
The access is open. Here is an example of how to get data in command line
Rough glacier locations can be retrieved from the GLIMS Glacier database. The full database can be downloaded as a zip file of approx. 1GB in size. For each glacier, shapefiles with polygons representing boundaries (with internal rocks). Online maps and search tools are also available for test queries.
Satellite data from popular missions such as Landsat and Sentinel can be accessed from multiple platforms. For instance, Sentinel images can be downloaded from the ESA Copernicus Open Access Hub but also from AWS and Google Earth Engine. In addition, multiple services are built around image repositories, offering GUI-based portals to search, visualise and download data from one or multiple missions, e.g:
Each of these services has its own tools to automatise the search and the download of data, making their scope not so generic. An interesting initiative in this field is the SpatioTemporal Asset Catalog (STAC) which is a set of common standards for exposing geospatial (meta-)data. Many of the services above expose STAC-compliant catalogs, meaning that users can query them using common APIs. Some catalogs require registration (e.g. the Sentinel Hub one), and various catalogs differ in the data availability and update frequency, so these information could be checked in order to select the catalog to query. The common API, however, allows the same tool to be used for all catalogs (see sat-search below). Available open-access catalogs include Landsat, Sentinel and CBERS collections, mostly pointing to data on AWS. Unfortunately, while meta-data for all these missions are publicly available, not all assets can be accessed from AWS: Sentinel-2 data are in a requester pays buckets, so users pay to download the full images (costs are based on HTTP requests and transfer size), even though fees seem rather limited.
sat-search
is a tool that includes a CLI and a python library to search and query meta-data (date, time, area of interest, cloud cover) from any STAC-compliant catalog. Results of the queries can be saved as JSON files, and the full or partial data (e.g. few selected bands) can be downloaded to disk from the asset links provided in the catalog. A similar tool from the same developers is sat-fetch
, which allows to clip all bands and download only a specific area of interest from cloud-optimized GeoTIFFs. The advantage of using sat-search
and sat-fetch
is that they can be used with any of the catalogs implementing STAC specifications. Thus, the same tools can be used to investigate different repositories and to search for data from different missions. In addition, sat-search
results can be automatically downloaded and ingested in memory as Xarray objects with NumPy or Dask arrays as underlying structure (see here). Currently, a disadvantage is the lack of open Sentinel-2 data on AWS (where most of the STAC catalogs point to). Sentinel-2 images in fact are only available there as requester-pays objects. If no repository with public data is available, Sentinel images could still be downloaded using the results of the sat-search
queries from other services using other tools: either from the Copernicus Open Access Hub using sentinelsat
(see below) or from the Google cloud either via the Google Earth Engine client or directly from a public bucket using the google-cloud-storage
tool.sentinelsat
consists in a CLI tool and a python library to query and download Sentinel data from ESA Copernicus Open Access Hub. Results of the search can be downloaded and ingested in memory as GeoPandas GeoDataFrames (easily convertible in Xarray objects). While many of the functionalities are similar to sat-search, this tool is less generic since it allows to browse only Data Hub portals and to retrieve data from the only Sentinel mission. A severe limitation in retrieving data from the Copernicus Open Hub is that data is kept online only for a limited amount of time (one year for Sentinel-2, processed at level 1C). Older data are put offline in a long-term archive (LTA). Trying to download data from sentinelsat triggers a request to bring data back online, which takes up to 24 hours, after which data can be downloaded for 3 days. Only one request to bring LTA data online can be submitted every 30 min.google-cloud-storage
tool and images downloaded to disk. A tool to retrieve satellite images for a given glacier could involve:
sat-search
to query catalogs of multiple missions for the tiles that include the AOI;sat-search
or via other tools like sentinelsat
and gcs
.@fnattino Hi Francesco, I think this is a very detailed and clear summary!
I agree with using sat-search
as the query tool because it gives us more opportunities on working with data other than Sentinel-2 (comparing with the parallelCollGS
option Bas mentioned).
About the archive, as far as I can see, if our goal is to query and download at least S2 and LANDSAT, it seems the best generic option we have now is google-cloud-storage
, since AWS
does not hold all the data, and SciHub only has S2 data? Maybe what will happen in the near future is we do a initial development to pass sat-search
results to google-cloud-storage
. If there are better options in the future, we do not need to change the query part.
@rogerkuou thanks for the review! Yep, public buckets are available on google cloud both for Sentinel-2 and Landsat, so maybe this is the best option for now..
As a researcher that needs to process time-series of optical satellite images from various sources (Sentinel-2, Landsat, etc.) I want to investigate tools for automatic retrieval of images So that I can download scenes for selected regions, days of the year and cloud conditions
Acceptance Criteria
Tasks
Links to info and code