This PR adds the API endpoints and most of the backend processing code for the downloads functionality.
REST API
In order to download HEV-E files, they must first be prepared. The client makes a request for an order containing multiple items. The server acknowledges the order request and then proceeds to prepare the ordered items by selecting the relevant data and applying any customization options. The client makes periodic calls to the server in order check the order's processing status. As each order item is prepared, its status becomes Completed and a download URL becomes available. The client can then make a request to this URL in order to retrieve the prepared order item.
The main endpoint is:
/gfdrr_det/api/v1/order/
The server replies to a GET request to this endpoint with a list of existing order resources. Each order is represented as a JSON object with a structure similar to:
Note the following interesting information regarding orders:
An order has a status - As it is being prepared, the order progresses through a series of statuses. When all items are processed the order reaches its final state of either Completed or Failed
An order has an id which is a URL for the details on the order. This URL can be used by the client to poll the server and check when an order (or when each of its items) are ready
An order is composed of order items
An order item also has a status
Each order item has the following properties:
layer name
format
bbox (optional)
taxonomic_categories (optional)
These properties are used to prepare the downloadable files and are the ones that a client must provide when requesting the creation of a new order
An order item has a download_url property, which will only contain a value once the item's status is Completed This URL can be used by the client to obtain the ordered item
An order item expires whenever the date signaled by its expires_on field is reached. When an item expires, its download_url field is returned as null
Creating new orders
Make a POST request to the order list endpoint with a body like this:
django-oseoserver is a reusable django app that receives OSEO order requests, records them in the database and dispatches them for processing in an async queue. It deals with the generic aspects related to order handling and dispatching and provides hooks for connecting with the code that does the actual processing of orders. This means that in the HEV-E code we defined the actual order processing code. There are a bunch of settings that deal with this and other aspects of django-oseoserver's configuration. These can be found at gfdrr_det.settings.base. As for the custom order processing code, the entrypoint is in gfdrr_det.orderprocessors.HeveOrderProcessor class.
The processing of each order item is done by using celery tasks called in an asynchronous fashion.
As for the actual files that are downloadable:
shapefile layers are generated by making a WFS request to geoserver, in a similar way as is currently implemented by geonode. We do not reuse geonode's code directly in order to be able to filter by bounding box and taxonomic categories. Implementation is on gfdrr_det.exposures.download.generate_shapefile
geopackage files are generated by using ogr2ogr with suitable parameters. Implementation at gfdrr_det.exposures.download.generate_geopackage
in order to prevent generating the same files multiple times, for each file we compute a hash and use it as the filename. The hash is computed from the requested order item options and is therefore independent of the actual file contents. This means that upon receiving a new order request we can compute the request's hashes and verify if the files already exist on disk. If so, no processing needs to be done.
In order to translate between our API's order request and the OSEO standard's schema, there is an XML template file at gfdrr_det/templates/gfdrr_det/download_request.xml. This template is rendered with the correct parameters and is then fed to django-oseoserver in one of its initial layers. This means that we benefit from the existing code's validation and settings.
The code in this PR is not a complete implementation. There are still some missing features, most notably:
Order item expiry and cleaning of the file system. django-oseoserver has a hook for this that allows running custom code when an order item needs to be expired. In order to cope with our file caching technique we'll need to decouple item expiration (merely revoking the item's download_url) from the process of actually deleting files from the filesystem;
The taxonomic_categories option is not implemented yet on both the shapefile and geopackage formats. For geopackage this means creating some more complex queries than the current ones. The reason for this is that while the frontend will send the parsed taxonomy as the option, the original data only has the raw taxonomy codes. For the shapefile option it should be simpler, as this format operates on our ingested data;
The notification_email field is not implemented yet on the order requests. django-oseoserver's default implementation expects orders to be made by authenticated users and notification emails are sent to these user accounts. On HEV-E we do not have user accounts so we'll need another way to store e-mail addresses and send notifications. The OSEO standard's take on notifications is a lot more involved than what we need currently so we'll probably (ab)use the order's orderRemark field and save the notification e-mail there. There is still the matter of extending django-oseoserver so that it is able to use that field to get e-mail recipients
This PR adds the API endpoints and most of the backend processing code for the downloads functionality.
REST API
In order to download HEV-E files, they must first be prepared. The client makes a request for an order containing multiple items. The server acknowledges the order request and then proceeds to prepare the ordered items by selecting the relevant data and applying any customization options. The client makes periodic calls to the server in order check the order's processing status. As each order item is prepared, its status becomes
Completed
and a download URL becomes available. The client can then make a request to this URL in order to retrieve the prepared order item.The main endpoint is:
The server replies to a GET request to this endpoint with a list of existing order resources. Each order is represented as a JSON object with a structure similar to:
Note the following interesting information regarding orders:
An order has a
status
- As it is being prepared, the order progresses through a series of statuses. When all items are processed the order reaches its final state of eitherCompleted
orFailed
An order has an
id
which is a URL for the details on the order. This URL can be used by the client to poll the server and check when an order (or when each of its items) are readyAn order is composed of order items
An order item also has a
status
Each order item has the following properties:
These properties are used to prepare the downloadable files and are the ones that a client must provide when requesting the creation of a new order
An order item has a
download_url
property, which will only contain a value once the item's status isCompleted
This URL can be used by the client to obtain the ordered itemAn order item expires whenever the date signaled by its
expires_on
field is reached. When an item expires, itsdownload_url
field is returned asnull
Creating new orders
Make a POST request to the order list endpoint with a body like this:
The previous example request will create an order with two items. The server's reply will feature an order object as described above.
A rough overview of the implementation
We are using django-oseoserver as the ordering backend. It is an (incomplete) implementation of the OGC Ordering Services Framework for Earth Observation Products Interface Standard.
django-oseoserver is a reusable django app that receives OSEO order requests, records them in the database and dispatches them for processing in an async queue. It deals with the generic aspects related to order handling and dispatching and provides hooks for connecting with the code that does the actual processing of orders. This means that in the HEV-E code we defined the actual order processing code. There are a bunch of settings that deal with this and other aspects of django-oseoserver's configuration. These can be found at
gfdrr_det.settings.base
. As for the custom order processing code, the entrypoint is ingfdrr_det.orderprocessors.HeveOrderProcessor
class.The processing of each order item is done by using celery tasks called in an asynchronous fashion.
As for the actual files that are downloadable:
shapefile layers are generated by making a WFS request to geoserver, in a similar way as is currently implemented by geonode. We do not reuse geonode's code directly in order to be able to filter by bounding box and taxonomic categories. Implementation is on
gfdrr_det.exposures.download.generate_shapefile
geopackage files are generated by using ogr2ogr with suitable parameters. Implementation at
gfdrr_det.exposures.download.generate_geopackage
in order to prevent generating the same files multiple times, for each file we compute a hash and use it as the filename. The hash is computed from the requested order item options and is therefore independent of the actual file contents. This means that upon receiving a new order request we can compute the request's hashes and verify if the files already exist on disk. If so, no processing needs to be done.
In order to translate between our API's order request and the OSEO standard's schema, there is an XML template file at
gfdrr_det/templates/gfdrr_det/download_request.xml
. This template is rendered with the correct parameters and is then fed to django-oseoserver in one of its initial layers. This means that we benefit from the existing code's validation and settings.The code in this PR is not a complete implementation. There are still some missing features, most notably:
Order item expiry and cleaning of the file system. django-oseoserver has a hook for this that allows running custom code when an order item needs to be expired. In order to cope with our file caching technique we'll need to decouple item expiration (merely revoking the item's
download_url
) from the process of actually deleting files from the filesystem;The
taxonomic_categories
option is not implemented yet on both the shapefile and geopackage formats. For geopackage this means creating some more complex queries than the current ones. The reason for this is that while the frontend will send the parsed taxonomy as the option, the original data only has the raw taxonomy codes. For the shapefile option it should be simpler, as this format operates on our ingested data;The
notification_email
field is not implemented yet on the order requests. django-oseoserver's default implementation expects orders to be made by authenticated users and notification emails are sent to these user accounts. On HEV-E we do not have user accounts so we'll need another way to store e-mail addresses and send notifications. The OSEO standard's take on notifications is a lot more involved than what we need currently so we'll probably (ab)use the order'sorderRemark
field and save the notification e-mail there. There is still the matter of extending django-oseoserver so that it is able to use that field to get e-mail recipients