Open sbrunato opened 3 years ago
Here is a snippet that allows to search for all the products available on all the providers in a given area of interest (around Toulouse here) some time in August 2020:
from eodag import EODataAccessGateway
from eodag.api.search_result import SearchResult
from eodag.utils.logging import setup_logging
setup_logging(verbose=1)
dag = EODataAccessGateway()
search_criteria = dict(
start='2020-08-01',
end='2020-08-10',
geom=[0, 43, 2, 45],
)
all_prods = SearchResult([])
# Loop over ALL the providers
for provider in dag.available_providers():
# Set it as the preferred one
dag.set_preferred_provider(provider=provider)
# Get the product ID, i.e. the products types (e.g. S2_MSI_L1C), for this provider
product_types = (
p["ID"]
for p in dag.list_product_types(provider=provider)
if p["ID"] != "GENERIC_PRODUCT_TYPE"
)
# And loop over them and search all the products available
for product_type in product_types:
try:
results = dag.search_all(productType=product_type, **search_criteria)
except Exception:
print(f"Failed to collect '{product_type}' products with '{provider}'")
results = []
print(f"Got {len(results)} '{product_type}' products with '{provider}'")
all_prods.extend(results)
print(f"Got a total of {len(all_prods)} products.")
(I got 1090 products)
@geonux since you're at the origin of this issue, I would like to ask you a few questions about it if I may.
To be able to search for all available data on all providers over a given AOI
I think it can be translated into two different ways:
Indeed, it is quite sure that were will be duplicate products (both provider A and B offer the same product i) from a search over the same AOI and time period.
If you are interested in 1., the snippet above should get you what want. Would it be enough to document it?
If you are interested in 2., this is trickier. We would need to remove duplicate products. We could rely on the product unique identifier, however, as shown in https://github.com/CS-SI/eodag/issues/136#issuecomment-808082569, we can't always make sure that different providers use the same id (surprisingly!). So there may still be some duplicates after an id filter. We could also rely on a combination of properties, and remove duplicates if 2 or more products share the same combination of, for instance, product_type / geometry / start date / end date.
Removing duplicates based on the id can be done as follows:
almost_unique_prods = SearchResult({p.properties["id"]: p for p in all_prods[::-1]}.values())
An attempt to remove potential duplicate products from almost_unique_prods
could be done as follows:
unique_prods = SearchResult({
(p.properties["startTimeFromAscendingNode"], p.geometry.wkt, p.product_type): p
for p in almost_unique_prods[::-1]
}.values())
If we implement 2., internally we could add __eq__
(and __hash__
?) to the EOProduct
class, to specify how we define whether two products are the same or not.
Note the reverse order on all_prods
and almost_unique_prods
in the dict comprehension above. Its aim is to ensure that, if there are duplicate products, the one that end up in unique_prods
is the one obtained from the first provider (the first one that offered this product). If we implement 2. we should ensure this priority is preserved, e.g. search_all(geom=..., start=..., end=..., providers=["peps", "sobloo"])
should return products from peps in priority.
If we decide to implement 1. or 2., where should add this?
search_all
: it seems like an obvious candidate => Yes?search
: since it returns a given page of a search, I believe it assumes the search is done on a single provider => No?search_iter_page
: I see it being used by advanced users, who may want to implement this (search all products from all providers) in a different way than what we'll do => No?Having differently formatted ids for the same products, depending on the providers must be fixed.
But this is also related to the fact that some providers are not (yet) configured to return Sentinel products in SAFE format. See #216 and #171
Original request made by @geonux :
We decided that the best way to approach this was to add a
providers
list parameter to thesearch
method. A list of all the providers can be retrieved withdag.available_providers()
. But a user could also provide a subset of the available providers:Note that there is already a
provider
kwarg that the user can pass tosearch
and that is used by_search_by_id
(for performance reasons, if the user already knows if a given provider has this product available).Note that the
search
command doesn't accept any--provider
option.For retrieving all the product types, we have to deal with the already used
productType
parameter. Options would be:productType
as a list;whoosh
to find matching product types using other criteria (level, platform, ...)_Note here that
dag.list_product_types(provider)
could come in handy ifproductType
accepts a list of product types._TODOs before working on a MR:
dag.available_providers
returns a correct list, and also whether they are ordered given their priority (if we want to preserver the providers priority here)whoosh
currently influences the search withproductType
being defined or not