This extension contains harvester plugins for harvesting from sources used by NextGEOSS as well as a metaharvester plugin for adding additional tags and metadata retrieved from an iTag instance.
The repository contains four plugins:
nextgeossharvest
, the base CKAN pluginesa
, a harvester plugin for harvesting Sentinel products from SciHub, NOA, and CODE-DE via their DHuS interfacescmems
, a harvester plugin for harvesting the following types of CMEMS products:
gome2
, a harvester plugin for harvesting the following types of GOME-2 coverage products:
itag
a harvester plugin for adding additional tags and metadata to datasets that have already been harvested (more on this later)python setup.py develop
in the ckanext-nextgeossharvest
directory.pip install -r requirements.txt
n the ckanext-nextgeossharvest
directory.ckanext-harvest
ckanext-spatial
ckanext-spatial
to use solr-spatial-field
for the spatial search backend. Instructions can be found here: http://docs.ckan.org/projects/ckanext-spatial/en/latest/spatial-search.html. You cannot use solr
as the spatial search backend because solr
only supports footprints that are effectively bounding boxes (polygons composed of five points), while the footprints of the datasets harvested by these plugins can be considerably more complex. Using postgis
as the spatial search backend is strongly discouraged, as it will choke on the large numbers of datasets that these harvesters will pull down..ini
file, as well as nextgeossharvest
and any of the NextGEOSS harvester plugins that you want to use.ckanext.nextgeossharvest.nextgeoss_username=
and ckanext.nextgeossharvest.nextgeoss_password=
in your .ini
file. The credentials are stored here rather than in the source config partly for security reasons and partly because of the way the extension is deployed. (It may make sense to move them to the source config in the future.)ckanext.nextgeossharvest.provider_log_dir=/path/to/your/logs
in your .ini
file. The log entries will look like this: INFO | esa_scihub | 2018-03-08 14:17:04.474262 | 200 | 2.885231s
(the second field will always be 12 characters and will be padded if necessary).Finished
when complete:
0 * * * * paster --plugin=ckanext-harvest harvester run -c /srv/app/production.ini >> /var/log/cron.log 2>&1
To harvest Sentinel products, activate the esa
plugin, which you will use to create a harvester that harvests from SciHub, NOA or CODE-DE. To harvest from more than one of those sources, just create more than one harvester and point it at a different source.
Note: The configuration object is required for all of these harvesters.
Create a new harvest source and select ESA Sentinel Harvester New
. The URL does not matter—the harvester only harvests from SciHub, NOA, or CODE-DE, depending on the configuration below.
To harvest from SciHub, source
must be set to "esa_scihub"
in the configuration. See Sentinel settings (SciHub, NOA & CODE-DE) for a complete description of the settings.
Note: you must place your username and password in the .ini
file as described above.
After saving the configuration, you can click Reharvest and the job will begin (assuming you have a cronjob like the one described above). Alternatively, you can use the paster command run_test
described in the ckanext-harvest
documentation to run the harvester without setting up the the gather consumer, etc.
Create a new harvest source and select ESA Sentinel Harvester New
. The URL does not matter—the harvester only harvests from SciHub, NOA, or CODE-DE, depending on the configuration below.
To harvest from NOA, source
must be set to "esa_noa"
in the configuration. See Sentinel settings (SciHub, NOA & CODE-DE) for a complete description of the settings.
Note: you must place your username and password in the .ini
file as described above.
After saving the configuration, you can click Reharvest and the job will begin (assuming you have a cronjob like the one described above). Alternatively, you can use the paster command run_test
described in the ckanext-harvest
documentation to run the harvester without setting up the the gather consumer, etc.
Create a new harvest source and select ESA Sentinel Harvester New
. The URL does not matter—the harvester only harvests from SciHub, NOA, or CODE-DE, depending on the configuration below.
To harvest from NOA, source
must be set to "esa_code"
in the configuration. See Sentinel settings (SciHub, NOA & CODE-DE) for a complete description of the settings.
Note: you must place your username and password in the .ini
file as described above.
After saving the configuration, you can click Reharvest and the job will begin (assuming you have a cronjob like the one described above). Alternatively, you can use the paster command run_test
described in the ckanext-harvest
documentation to run the harvester without setting up the the gather consumer, etc.
source
: (required, string) determines whether the harvester harvests from SciHub, NOA, or CODE-DE. To harvest from SciHub, use "source": "esa_scihub"
. To harvest from NOA, use "source": "esa_noa"
. To harvest from CODE-DE, use "source": "esa_code"
.update_all
: (optional, boolean, default is false
) determines whether or not the harvester updates datasets that already have metadadata from this source. For example: if we have "update_all": true
, and dataset Foo has already been created or updated by harvesting from SciHub, then it will be updated again when the harvester runs. If we have "update_all": false
and Foo has already been created or updated by harvesting from SciHub, then the dataset will not be updated when the harvester runs. And regardless of whether update_all
is true
or false
, if a dataset has not been created or updated with metadata from SciHub (it's new, or it was created via NOA or CODE-DE and has no SciHub metadata), then it will be updated with the additional SciHub metadata.start_date
: (optional, datetime string, default is "any" or "from the earliest date onwards" if the harvester is new, or from the ingestion date of the most recently harvested product if it has been run before) determines the end of the date range for harvester queries. Example: "start_date": "2018-01-16T10:30:00.000Z". Note that the entire datetime string is required. 2018-01-01
is not valid. Using full datetimes is especially useful when testing, as it is possible to restrict the number of possible results by searching only within a small time span, like 20 minutes. end_date
: (optional, datetime string, default is "now" or "to the latest possible date") determines the end of the date range for harvester queries. Example: "end_date": "2018-01-16T11:00:00.000Z". Note that the entire datetime string is required. 2018-01-01
is not valid. Using full datetimes is especially useful when testing, as it is possible to restrict the number of possible results by searching only within a small time span, like 20 minutes.product_type
: (optional, string) determines the Sentinel collection (product type) to be considered by the harvester when querying the data provider interface. The possible values are SLC
, GRD
, OCN
, S2MSI1C
, S2MSI2A
, S2MSI2Ap
, OL_1_EFR___
, OL_1_ERR___
, OL_2_LFR___
, OL_2_LRR___
, SR_1_SRA___
, SR_1_SRA_A_
, SR_1_SRA_BS
, SR_2_LAN___
, SL_1_RBT___
, SL_2_LST___
, SY_2_SYN___
, SY_2_V10___
, SY_2_VG1___
or SY_2_VGP___
. If no product_type is provided, the harvester will have the normal behavior and consider all.aoi
: (optional, string with POLYGON) determines the Area of Interest to be considered by the harvester when querying the data provider interface. The aoi shall be provided with the following format: POLYGON((-180 -90,-180 90,180 90,180 -90,-180 -90))
. More points can be added to the polygon. If no aoi is provided, the harvester will consider as global.datasets_per_job
: (optional, integer, defaults to 1000) determines the maximum number of products that will be harvested during each job. If a query returns 2,501 results, only the first 1000 will be harvested if you're using the default. This is useful for running the harvester via recurring jobs intended to harvest products incrementally (i.e., you want to start from the beginning and harvest all available products). The harvester will harvest products in groups of 1000, rather than attmepting to harvest all x-hundred-thousand at once. You'll get feedback after each job, so you'll know if there are errors without waiting for the whole job to run. And the harvester will automatically resume from the harvested dataset if you're running it via a recurring cron job.timeout
: (optional, integer, defaults to 4) determines the number of seconds to wait before timing out a request.skip_raw
: (optional, boolean, defaults to false) determines whether RAW products are skipped or included in the harvest.make_private
is optional and defaults to false
. If true
, the datasets created by the harvester will be marked private. This setting is not retroactive. It only applies to datasets created by the harvester while the setting is true
.Example configuration with all variables present:
{
"source": "esa_scihub",
"update_all": false,
"start_date": "2018-01-16T10:30:00.000Z",
"end_date": "2018-01-16T11:00:00.000Z",
"datasets_per_job": 100,
"timeout": 4,
"skip_raw": true,
"make_private": false
}
{
"source": "esa_scihub",
"update_all": false,
"start_date": "2019-01-01T00:00:00.000Z",
"aoi": "POLYGON((2.0524444097380456 51.60572085265915,5.184653052425238 51.67771256185287,7.138937077349725 50.43826001622307,5.612989277066222 49.25292867929642,1.9721313676178616 50.83443942461676,2.0524444097380456 51.60572085265915,2.0524444097380456 51.60572085265915))",
"product_type": "S2MSI2A",
"datasets_per_job": 100,
"timeout": 20,
"skip_raw": true,
"make_private": false
}
Note: you must place your username and password in the .ini
file as described above.
To harvest from more than one Sentinel source, just create a harvester source for each Sentinel source.For example, to harvest from all three sources:
ESA Sentinel Harvester New
and make sure that the configuration contains "source": "esa_scihub"
.ESA Sentinel Harvester New
and make sure that the configuration contains "source": "esa_noa"
.ESA Sentinel Harvester New
and make sure that the configuration contains "source": "esa_code"
.You'll probably want to specify start and end times as well as the number of datasets per job for each harvester. If you don't, don't worry—the default number of datasets per job is 1000, so you won't be flooded with datasets.
Then just run each of the harvesters. You can run them all at the same time. If a product has already been harvested by another harvester, then the other harvesters will only update the existing dataset and add additional resources and metadata. They will not overwrite the resources and metadata that already exist (e.g., the SciHub harvester won't replace resources from CODE-DE with resources from SciHub, it will just add SciHub resources to the dataset alongside the existing CODE-DE resources.
The three (really, two) Sentinel harvesters all inherit from the same base harvester classes. As mentioned above, the SciHub, NOA and CODE-DE "harvesters" are all the same harvester with different configurations. The "source"
configuration is a switch that 1) causes the harvester to use a different base URL for querying the OpenSearch service and 2) changes the labels added to the resources. In all cases, the same methods are used for creating/updating the datasets.
The workflow for all the harvesters is:
The created/updated counts for each harvester job will be accurate. The count that appears in the sidebar on each harvester's page, however, will not be accurate. Besides issues with how Solr updates the harvest_source_id
associated with each dataset, the fact that up to three harvesters may be creating or updating a single dataset means that only one harvest source can "own" a dataset at any given time. If you need to evaluate the performance of a harvester, use the job reports.
To harvest CMEMS products, activate the cmems
plugin, which you will use to create a harvester that harvests one of the following types of CMEMS product:
To harvest more than one of those types of product, just create more than one harvester and configure a different harvester_type
.
The URL you enter in the harvester GUI does not matter--the plugin determines the correct URL based on the harvester_type
.
The different products are hosted on different services, so separate harvesters are necessary for ensuring that the harvesting of one is not affected by errors or outages on the others.
harvester_type
determines which type of product will be harvested. It must be one of the following seven strings: sst
, sic_north
, sic_south
, ocn
, gpaf
, slv
or mog
.
start_date
determines the start date for the harvester job. It must be the string YESTERDAY
or a string describing a date in the format YYYY-MM-DD
, like 2017-01-01
.
end_date
determines the end date for the harvester job. It must be the string TODAY
or a string describing a date in the format YYYY-MM-DD
, like 2017-01-01
. The end_date is not mandatory and if not included the harvester will run until catch up the current day.
The harvester will harvest all the products available on the start date and on every date up to but not including the end date. If the start and end dates are YESTERDAY
and TODAY
, respectively, then the harvester will harvest all the products available yesterday but not any of the products available today. If the start and end dates are 2018-01-01
and 2018-02-01
, respectively, then the harvester will harvest all the products available in the month of January (and none from the month of February).
timeout
determines how long the harvester will wait for a response from a server before cancelling the attempt. It must be a postive integer. Not mandatory.
username
and password
are your username and password for accessing the CMEMS products at the source for the harvester type you selected above.
make_private
is optional and defaults to false
. If true
, the datasets created by the harvester will be marked private. This setting is not retroactive. It only applies to datasets created by the harvester while the setting is true
.
Examples of config:
{
"harvester_type":"slv",
"start_date":"2017-01-01",
"username":"your_username",
"password":"your_password",
"make_private":false
}
{
"harvester_type": "sic_south",
"start_date": "2017-01-01",
"end_date": "TODAY",
"timeout": 10,
"username": "your_username",
"password": "your_password",
"make_private": false
}
You can run the harvester on a Daily update frequencey with YESTERDAY
and TODAY
as the start and end dates. Since requests may time out, you can also run the harvester more than once a day using the Manual update frequency and a cron job. There's no way to recover from outages at the moment; the CMEMS harvester could be more robust.
The GOME-2 harvester harvests products from the following GOME-2 coverages:
Unlike other harvesters, the GOME-2 harvester only makes requests to verify that a product exists. It programmatically creates datasets and resources for products that do exist within the specified date range.
The GOME-2 harvester has two required and one optional setting.
start_date
(required) determines the date on which the harvesting begins. It must be in the format YYY-MM-DD
or the string "YESTERDAY"
. If you want to harvest from the earliest product onwards, use 2007-01-04
. If you will be harvesting on a daily basis, use "YESTERDAY"
end_date
(required) determines the date on which the harvesting ends. It must be in the format YYY-MM-DD
or the string "TODAY"
. It is exclusive, i.e., if the end date is 2017-03-2
, then products will be harvested up to and including 2017-03-01 and no products from 2017-03-02 will be included. For daily harvesting use "TODAY"
.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date": "2017-03-01",
"end_date": "2017-03-02",
"make_private": false
}
or
{
"start_date": "YESTERDAY",
"end_date": "TODAY",
"make_private": false
}
gome2
to the list of plugins in your .ini file.GOME2
from the list of harvesters.Manual
.The PROBA-V harvester harvests products from the following collections:
The products from the on time collections are created and published on the same day. The product from delayed collections are published with one month delay after being created.
The collections were also splitted according to the resoltion to avoid a huge number of datasets being harvested. L1C, L2A and S1 products are published daily. S5 products are published every 5 days. S10 products are published every 10 days. S1, S5 and S10 products are tiles covering almost the entire world. Each dataset correspond to a single tile.
The PROBA-V harvester has configuration as:
start_date
(required) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DD
. If you want to harvest from the earliest product onwards, use 2018-01-01
end_date
(optional) determines the end date for the harvester job. It must be a string describing a date in the format YYYY-MM-DD
, like 2018-01-31. The end_date is not mandatory and if not included the harvester will run until catch up the current day. To limit the number of datasets per job each job will harvest a maximum of 2 days of data.username
and password
are your username and password for accessing the PROBA-V products at the source.collection
(required) to define the collection that will be collected. It can be PROBAV_P_V001
, PROBAV_S1-TOA_1KM_V001
, PROBAV_S1-TOC_1KM_V001
, PROBAV_S10-TOC_1KM_V001
, PROBAV_S10-TOC-NDVI_1KM_V001
, PROBAV_S1-TOA_100M_V001
, PROBAV_S1-TOC-NDVI_100M_V001
, PROBAV_S5-TOC-NDVI_100M_V001
, PROBAV_S5-TOA_100M_V001
, PROBAV_S5-TOC_100M_V001
, PROBAV_S1-TOC_100M_V001
, PROBAV_S1-TOA_333M_V001
, PROBAV_S1-TOC_333M_V001
, PROBAV_S10-TOC_333M_V001
, PROBAV_S10-TOC-NDVI_333M_V001
, PROBAV_L2A_1KM_V001
, PROBAV_L2A_100M_V001
or PROBAV_L2A_333M_V001
.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date":"2018-08-01",
"collection":"PROBAV_S1-TOC_1KM_V001",
"username":"nextgeoss",
"password":"nextgeoss",
"make_private":false
}
{
"start_date":"2018-08-01",
"collection":"PROBAV_L2A_1KM_V001",
"username":"nextgeoss",
"password":"nextgeoss",
"make_private":false
}
{
"start_date":"2018-08-01",
"collection":"PROBAV_P_V001",
"username":"nextgeoss",
"password":"nextgeoss",
"make_private":false
}
The start_date for the delayed collections can be any date before the current_day - 1 month. For the current collections the start_date can be any date.
probav
to the list of plugins in your .ini file.Proba-V Harvester
from the list of harvesters.Manual
.The GLASS LAI harvester harvests products from the following collections:
The GLASS LAI harvester has configuration as:
sensor
to define if the harvester will collect products based on AVHRR (avhrr
) or MODIS (modis
).make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"sensor":"avhrr",
"make_private":false
}
{
"sensor":"modis",
"make_private":false
}
glass_lai
to the list of plugins in your .ini file.GLASS LAI Harvester
from the list of harvesters.Manual
from the frequency options. The harvester only needs to run twice (with two different configurations).The static EBVs harvester harvests products from the following collections:
The Static EBVs harvester has configuration as:
make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"make_private":false
}
ebvs
to the list of plugins in your .ini file.EBVs
from the list of harvesters.Manual
from the frequency options. The harvester only needs to run once because the datasets are static.The Plan4All harvester harvests products from the following collections:
The Plan4All harvester has configuration as:
datasets_per_job
(optional, integer, defaults to 100) determines the maximum number of products that will be harvested during each job. If a query returns 2,501 results, only the first 100 will be harvested if you're using the default. This is useful for running the harvester via recurring jobs intended to harvest products incrementally (i.e., you want to start from the beginning and harvest all available products). The harvester will harvest products in groups of 100, rather than attmepting to harvest all x-hundred-thousand at once. You'll get feedback after each job, so you'll know if there are errors without waiting for the whole job to run. And the harvester will automatically resume from the harvested dataset if you're running it via a recurring cron job.timeout
(optional, integer, defaults to 60) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"datasets_per_job": 10,
"timeout": 60,
"make_private": false
}
plan4all
to the list of plugins in your .ini file.Plan4All Harvester
from the list of harvesters.Manual
from the frequency options. The MODIS harvester harvests products from the following collections, which can be divided by time resolution:
All collections, with the exception of collection MOD17A3H, are updated on a weekly / biweekly basis. Collection MOD17A3H is the only collection that is static, where the last dataset refers to 2015-01-03.
Due to the fact that granule queries now require collection identifiers, each collection has to be harvested with different harvesters.
The MODIS harvester has configuration has:
collection
(required) to define the collection that will be collected. It can be MYD13Q1
, MYD13A1
, MYD13A2
, MOD13Q1
, MOD13A1
, MOD13A2
, MOD17A3H
, MOD17A2H
, MYD15A2H
, MOD15A2H
, MOD14A2
, MYD14A2
.start_date
(required) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DDTHH:MM:SSZ
. If you want to harvest from the earliest product onwards, use the starting dates presented in "Harvesting MODIS products"timeout
(optional, integer, defaults to 10) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"collection": "MYD13Q1",
"start_date": "2002-07-04T00:00:00Z",
"make_private": false
}
modis
to the list of plugins in your .ini file.MODIS Harvester
from the list of harvesters.Manual
from the freuqency options. The GDACS harvester harvests products from the following collections:
The GDACS harvester has configuration as:
data_type
determines which collection will be harvested. It must be one of the following two strings: signal
or magnitude
.
request_check
determines if the URL of each harvested dataset will be tested. It must be one of the following two strings: yes
or no
.
start_date
determines the start date for the harvester job. It must be the string YESTERDAY
or a string describing a date in the format YYYY-MM-DD
, like 1997-12-01
.
end_date
determines the end date for the harvester job. It must be the string TODAY
or a string describing a date in the format YYYY-MM-DD
, like 1997-12-01
. The end_date is not mandatory and if not included the harvester will run until catch up the current day.
timeout
determines how long the harvester will wait for a response from a server before cancelling the attempt. It must be a postive integer. Not mandatory.
make_private
is optional and defaults to false
. If true
, the datasets created by the harvester will be marked private. This setting is not retroactive. It only applies to datasets created by the harvester while the setting is true
.
{
"data_type":"signal",
"request_check":"yes",
"start_date":"1997-12-01",
"make_private":false
}
or
{
"data_type":"magnitude",
"request_check":"yes",
"start_date":"1997-12-01",
"make_private":false
}
gdacs
to the list of plugins in your .ini file.GDACS
from the list of harvesters.Manual
from the frequency options. The DEIMOS-2 harvester harvests products from the following collections:
The number of products is static, and thus the harvaster only needs to be run once.
The DEIMOS-2 harvester has configuration as:
harvester_type
determines the ftp domain, as well as the directories in said domain.username
and password
are your username and password for accessing the DEIMOS-2 products at the source for the harvester type you selected above.timeout
(optional, integer, defaults to 60) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"harvester_type":"deimos_imaging",
"username":"your_username",
"password":"your_password",
"make_private":false
}
deimosimg
to the list of plugins in your .ini file.DEIMOS Imaging
from the list of harvesters.Manual
from the frequency options. The EBAS-NILU harvester collects products from the following collections:
The EBAS-NILU harvester has configuration as:
start_date
: (optional, datetime string, if the harvester is new, or from the ingestion date of the most recently harvested product if it has been run before) determines the start of the date range for harvester queries. Example: "start_date": "2018-01-16T10:30:00Z". Note that the entire datetime string is required. 2018-01-01
is not valid. end_date
: (optional, datetime string, default is "NOW") determines the end of the date range for harvester queries. Example: "end_date": "2018-01-16T11:00:00Z". Note that the entire datetime string is required. 2018-01-01
is not valid.timeout
: (optional, integer, defaults to 10) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date": "2017-01-01T00:00:00Z",
"timeout": 4,
"make_private": false
}
ebas
to the list of plugins in your .ini file.EBAS Harvester
from the list of harvesters.Manual
from the frequency options. The SIMOcean harvester harvests products from the following collections:
New products of these collections are created and published daily.
The SIMOcean harvester has configuration as:
start_date
: (required, datetime string, if the harvester is new, or from the ingestion date of the most recently harvested product if it has been run before) determines the start of the date range for harvester queries. Example: "start_date": "2018-01-16T10:30:00Z". Note that the entire datetime string is required. 2018-01-01
is not valid. end_date
: (optional, datetime string, default is "NOW") determines the end of the date range for harvester queries. Example: "end_date": "2018-01-16T11:00:00Z". Note that the entire datetime string is required. 2018-01-01
is not valid.datasets_per_job
: (optional, integer, defaults to 100) determines the maximum number of products that will be harvested during each job. timeout
: (optional, integer, defaults to 10) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date": "2017-01-01T00:00:00Z",
"timeout": 4,
"datasets_per_job": 100,
"make_private": false
}
simocean
to the list of plugins in your .ini file.SIMOcean Harvester
from the list of harvesters.Manual
from the frequency options. The EPOS-Sat harvester harvests products from the following collections:
The number of products is low, due to the fact that currently there are only sample data. A large quantity of data is expected to start being injected in September of 2019.
The EPOS-Sat harvester has configuration as:
collection
(required) to define the collection that will be collected. It can be inu
, inw
, dts
, coh
, aps
, cosneu
.start_date
(required) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DDTHH:MM:SSZ
. If you want to harvest from the earliest product onwards, use 2010-01-01T00:00:00Z
. end_date
(optional) determines the date on which the harvesting ends. It must be in the format YYYY-MM-DDTHH:MM:SSZ
, it defaults into TODAY
.datasets_per_job
(optional, integer, defaults to 100) determines the maximum number of products that will be harvested during each job.timeout
(optional, integer, defaults to 4) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"collection": "inw",
"start_date": "2010-01-16T10:30:00Z",
"timeout": 4,
"make_private": false
}
epos
to the list of plugins in your .ini file.EPOS Sat Harvester
from the list of harvesters.Manual
from the frequency options. The Food Security harvester harvests the VITO pilot outputs for the following collections:
1. NextGEOSS Sentinel-2 FAPAR
2. NextGEOSS Sentinel-2 FCOVER
3. NextGEOSS Sentinel-2 LAI
4. NextGEOSS Sentinel-2 NDVI
The date of the pilot outputs can be different of the current date since the pilot processes old Sentinel Data.
The Food Security harvester has configuration has:
start_date
(required) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DD
. If you want to harvest from the earliest product onwards, use 2017-01-01
end_date
(optional) determines the end date for the harvester job. It must be a string describing a date in the format YYYY-MM-DD
, like 2018-01-31. The end_date is not mandatory and if not included the harvester will run until catch up the current day. To limit the number of datasets per job each job will harvest a maximum of 2 days of data.username
and password
are your username and password for accessing the PROBA-V products at the source.collection
(required) to define the collection that will be collected. It can be FAPAR
, FCOVER
, LAI
or NDVI
.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date":"2017-01-01",
"collection":"FAPAR",
"username":"nextgeoss",
"password":"nextgeoss",
}
foodsecurity
to the list of plugins in your .ini file.Food Security Harvester
from the list of harvesters.Manual
from the frequency options.The VITO CGS S1 harvester collects the products of an external VITO project for the following collections:
1. VITO CGS S1
2. CGS S1 GRD L1
3. CGS S1 GRD SIGMA0 L1 (NOT AVAILABLE YET)
The Food Security harvester has configuration has:
start_date
(required) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DD
. If you want to harvest from the earliest product onwards, use 2017-01-01
timeout
(optional, integer, defaults to 4) determines the number of seconds to wait before timing out a request.username
and password
are your username and password for accessing the PROBA-V products at the source.collection
(required) to define the collection that will be collected. It can be SLC_L1
, GRD_L1
.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date":"2018-01-01",
"collection":"SLC_L1",
"username":"username",
"password":"password",
"timeout":1,
"make_private":false
}
gdacs
to the list of plugins in your .ini file.GDACS Harvester
from the list of harvesters.
cgss1
to the list of plugins in your .ini file.VITO CGS S1 Harvester
from the list of harvesters.Manual
from the frequency options.The Cold Regions harvester harvests the NERSC pilot outputs for the following collections:
1. Sentinel-1 HH/HV based ice/water classification
2. Sea ice and water classification in the Arctic for INTAROS 2018 field experiment
3. Sea ice and water classification in the Arctic for CAATEX/INTAROS 2019 field experiment
4. Average sea ice drift in the Arctic for INTAROS 2018 field experiment
5. Average sea ice drift in the Arctic for CAATEX 2019 field experiment
The Cold Regions harvester will run one time per collection and it will collect all the cold regions datasets within the input collection(static data). In the command line run:
$ python ./ckanext/nextgeossharvest/harvesters/coldregions.py <destination_ckan_URL> <destination_ckan_apikey> "nersc" <collection_id>
The following collection IDs are available:
The Landsat-8 harvester collects the Level-1 data products generated from Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS). The following collection 1 Tiers are harvested:
1. Landsat-8 Real-Time (RT)
2. Landsat-8 Tier 1 (T1)
3. Landsat-8 Tier 2 (T2)
The pre-processed products are not harvested due to the fact that they are deleted in a time interval of 6 months in favor of calibrated products.
The Landsat-8 harvester has configuration has:
path
(optional) determines the WRS path, where the product collection will start.row
(optional) determines the WRS row, where the product collection will start.access_key
and secret_key
(required) are your AWS account access and secret key.bucket
(required) to define the AWS S3 bucket to harvest, for Landsat-8 use landsat-pds
.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"path":1,
"row":1,
"access_key":"your_access_key",
"secret_key": "your_secret_key",
"bucket": "landsat-pds",
"make_private": false
}
landsat8
to the list of plugins in your .ini file.Landsat-8 Harvester
from the list of harvesters.Manual
from the frequency options.The MELOA harvester harvests products from the following collections:
New products of these collections are created and published after the campaigns.
The MELOA harvester has configuration as:
start_date
: (required, datetime string, if the harvester is new, or from the ingestion date of the most recently harvested product if it has been run before) determines the start of the date range for harvester queries. Example: "start_date": "2019-10-01T00:00:00Z". Note that the entire datetime string is required. 2019-10-01
is not valid. end_date
: (optional, datetime string, default is "NOW") determines the end of the date range for harvester queries. Example: "end_date": "2020-01-01T00:00:00Z". Note that the entire datetime string is required. 2020-01-01
is not valid.datasets_per_job
: (optional, integer, defaults to 100) determines the maximum number of products that will be harvested during each job. timeout
: (optional, integer, defaults to 10) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date": "2019-10-01T00:00:00Z",
"timeout": 4,
"datasets_per_job": 100,
"make_private": false
}
meloa
to the list of plugins in your .ini file.MELOA Harvester
from the list of harvesters.Manual
from the frequency options. The SAEON harvester collects the products for the following collections:
1. Climate Systems Analysis Group (South Africa)
The SAEON harvester has configuration has:
datasets_per_job
(optional, integer, defaults to 100) determines the maximum number of products that will be harvested during each job. If a query returns 2,501 results, only the first 100 will be harvested if you're using the default. This is useful for running the harvester via recurring jobs intended to harvest products incrementally (i.e., you want to start from the beginning and harvest all available products). The harvester will harvest products in groups of 100, rather than attmepting to harvest all x-hundred-thousand at once. You'll get feedback after each job, so you'll know if there are errors without waiting for the whole job to run. And the harvester will automatically resume from the harvested dataset if you're running it via a recurring cron job.timeout
(optional, integer, defaults to 60) determines the number of seconds to wait before timing out a request.update_all
(optional, boolean, default is false
) determines whether or not the harvester updates datasets that already have metadadata from this source. For example: if we have "update_all": true, and dataset Foo has already been created or updated by harvesting, then it will be updated again when the harvester runs. If we have "update_all": false and Foo has already been created or updated by harvesting, then the dataset will not be updated when the harvester runs. And regardless of whether update_all is true or false, if a dataset has not been collected, then it will be created in the catalogue.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.source_url
determines the base URL for the data source to query.{
"datasets_per_job": 100,
"timeout": 60,
"make_private": false,
"source_url": "https://staging.saeon.ac.za"
}
saeon
to the list of plugins in your .ini file.SAEON Harvester
from the list of harvesters.Manual
from the frequency options. The NOA Groundsegment harvester collects the products for the following instruments:
1. VIIRS (Visible Infrared Imaging Radiometer Suite)
2. MODIS (Moderate Resolution Imaging Spectroradiometer)
3. AIRS (Atmospheric InfraRed Sounder)
4. MERSI (Medium Resolution Spectral Imager)
5. AVHRR/3 (Advanced Very-High-Resolution Radiometer)
The NOA Groundsegment harvester configuration contains the following options:
start_date
(optional) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DDTHH:mm:ssZ
.end_date
(optional) determines the date on which the harvesting ends. It must be in the format YYYY-MM-DDTHH:mm:ssZ
.username
(required) Enter your NOA groundsegment username.password
(required) Enter your NOA groundsegment password.page_timeout
(optional, integer, defaults to 2) determines the maximum number of pages that will be harvested during each job. If a query returns 25 pages, only the first 5 will be harvested if you're using the default. Each page corresponds to 100 products. This is useful for running the harvester via recurring jobs intended to harvest products incrementally (i.e., you want to start from the beginning and harvest all available products). The harvester will harvest products in groups of 500, rather than attempting to harvest all x-hundred-thousand at once. You'll get feedback after each job, so you'll know if there are errors without waiting for the whole job to run. And the harvester will automatically resume from the harvested dataset if you're running it via a recurring cron job.update_all
(optional, boolean, default is false
) determines whether or not the harvester updates datasets that already have metadadata from this source. For example: if we have "update_all": true, and dataset Foo has already been created or updated by harvesting, then it will be updated again when the harvester runs. If we have "update_all": false and Foo has already been created or updated by harvesting, then the dataset will not be updated when the harvester runs. And regardless of whether update_all is true or false, if a dataset has not been collected, then it will be created in the catalogue.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date":"2019-01-01T00:00:00Z",
"end_date":"2020-08-01T23:59:00Z",
"username":"your_username",
"password":"your_password",
"page_timeout": "2"
}
noa_groundsegment
to the list of plugins in your .ini file.NOA Groundsegment Harvester
from the list of harvesters.Manual
from the frequency options. The FSSCAT harvester collects the products for the following file types:
1. FS1_GRF_L1B_CAL
2. FS1_GRF_L1B_SCI
3. FS1_GRF_L1C_SCI
4. FS1_GRF_L2__SIE
5. FS1_GRF_L3__ICM
6. FS1_MWR_L1B_SCI
7. FS1_MWR_L1C_SCI
8. FS1_MWR_L2A_TB_
9. FS1_MWR_L2B_SIT
10. FS1_MWR_L2B_SM_
11. FS1_MWR_L3__TB_
12. FS1_MWR_L3__SIT
13. FS1_MWR_L3__SM_
14. FS1_MWR_L4__SM_
15. FS2_HPS_L1C_SCI
16. FS2_HPS_L2__RDI
17. FSS_SYN_L4__SM_
The FSSCAT harvester has configuration as:
file_type
(required) determines the FSSCAT file type to be catalogued. It can be FS1_GRF_L1B_CAL
, FS1_GRF_L1B_SCI
, FS1_GRF_L1C_SCI
, FS1_GRF_L2__SIE
, FS1_GRF_L3__ICM
, FS1_MWR_L1B_SCI
, FS1_MWR_L1C_SCI
, FS1_MWR_L2A_TB_
, FS1_MWR_L2B_SIT
, FS1_MWR_L2B_SM_
, FS1_MWR_L3__TB_
, FS1_MWR_L3__SIT
, FS1_MWR_L3__SM_
, FS1_MWR_L4__SM_
, FS2_HPS_L1C_SCI
, FS2_HPS_L2__RDI
or FSS_SYN_L4__SM_
.start_date
(mandatory) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DD
.end_date
(optional) determines the date on which the harvesting ends. It must be in the format YYYY-MM-DD
.ftp_domain
(required) URL of the FSSCAT FTP.ftp_path
(required) Path of the FSSCAT FTP where the list of files are published.ftp_pass
(required) Password of the FSSCAT FTP.ftp_user
(required) Username of the user allowed to access the harvesting directory in the FSSCAT FTP.ftp_port
(optional, integer, defaults to 21) Port of the FSSCAT FTP.ftp_timeout
(optional, integer, defaults to 20) determines the seconds until the timeout when accessing the FTP.update_all
(optional, boolean, default is false
) determines whether or not the harvester updates datasets that already have metadadata from this source. For example: if we have "update_all": true, and dataset Foo has already been created or updated by harvesting, then it will be updated again when the harvester runs. If we have "update_all": false and Foo has already been created or updated by harvesting, then the dataset will not be updated when the harvester runs. And regardless of whether update_all is true or false, if a dataset has not been collected, then it will be created in the catalogue.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.max_datasets
(optional, integer, defaults to 100) determines the maximum number of datasets to be catalogued.{
"start_date": "2020-10-30",
"end_date": "2020-11-01",
"file_type": "FS1_GRF_L1C_SCI",
"ftp_domain": "<FSSCAT_FTP_DOMAIN>",
"ftp_path": "<FSSCAT_FTP_PATH>",
"ftp_pass": "<FSSCAT_FTP_PASS>",
"ftp_user": "<FSSCAT_FTP_USER>",
"ftp_port": 21,
"make_private": false,
"max_dataset": 10
}
fsscat
to the list of plugins in your .ini file.FSSCAT Harvester
from the list of harvesters.The NOA GeObservatory is activated in major geohazard events (earthquakes, volcanic activity, landslides,etc.) and automatically produces a series of Sentinel-1 based co-event interferograms (DInSAR) to map the surface deformation associated with the event. It also produces pre-event interferograms to be used as a benchmark.
This harvester collects the aforementioned interferograms.
The NOA GeObservatory harvester configuration contains the following options:
start_date
(optional) determines the date on which the harvesting begins. It must be in the format YYYY-MM-DDTHH:mm:ssZ
.end_date
(optional) determines the date on which the harvesting ends. It must be in the format YYYY-MM-DDTHH:mm:ssZ
.page_timeout
(optional, integer, defaults to 2) determines the maximum number of pages that will be harvested during each job. If a query returns 25 pages, only the first 5 will be harvested if you're using the default. Each page corresponds to 100 products. This is useful for running the harvester via recurring jobs intended to harvest products incrementally (i.e., you want to start from the beginning and harvest all available products). The harvester will harvest products in groups of 500, rather than attempting to harvest all x-hundred-thousand at once. You'll get feedback after each job, so you'll know if there are errors without waiting for the whole job to run. And the harvester will automatically resume from the harvested dataset if you're running it via a recurring cron job.update_all
(optional, boolean, default is false
) determines whether or not the harvester updates datasets that already have metadadata from this source. For example: if we have "update_all": true, and dataset Foo has already been created or updated by harvesting, then it will be updated again when the harvester runs. If we have "update_all": false and Foo has already been created or updated by harvesting, then the dataset will not be updated when the harvester runs. And regardless of whether update_all is true or false, if a dataset has not been collected, then it will be created in the catalogue.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date":"2017-01-01T00:00:00Z",
"end_date":"2020-08-01T23:59:00Z",
"page_timeout": "2"
}
Add noa_geobservatory
to the list of plugins in your .ini file.
Create a new harvester via the harvester interface.
Select NOA GeObservatory Harvester
from the list of harvesters.
Add a config as described above.
Select Manual
from the frequency options.
Run the harvester. It will programmatically create datasets.
The Energy Data harvester harvests products from the Energy Data API
The Energy Data harvester has configuration as:
start_date
: (required, datetime string, if the harvester is new, or from the ingestion date of the most recently harvested product if it has been run before) determines the start of the date range for harvester queries. Example: "start_date": "2019-10-01T00:00:00Z". Note that the entire datetime string is required. 2019-10-01
is not valid. end_date
: (optional, datetime string, default is "NOW") determines the end of the date range for harvester queries. Example: "end_date": "2020-01-01T00:00:00Z". Note that the entire datetime string is required. 2020-01-01
is not valid.datasets_per_job
: (optional, integer, defaults to 100) determines the maximum number of products that will be harvested during each job. timeout
: (optional, integer, defaults to 10) determines the number of seconds to wait before timing out a request.make_private
(optional) determines whether the datasets created by the harvester will be private or public. The default is false
, i.e., by default, all datasets created by the harvester will be public.{
"start_date": "2017-10-01T00:00:00Z",
"datasets_per_job": 100
}
energydata
to the list of plugins in your .ini file.Energy Data Harvester
from the list of harvesters.Manual
from the frequency options. The basic harvester workflow is divided into three stages. Each stage has a related method, and each method must be included in the harvester plugin.
The three methods are:
gather_stage()
fetch_stage()
import_stage()
While the fetch_stage()
method must be included, it may be the case that the harvester does not require a fetch stage (for instance, if the source is an OpenSearch service, then the search results in the gather stage may already include the necessary content, so there's no need to fetch it again. In those cases, the fetch_stage()
method will still be implemented, but it will just return True
. The gather_stage()
and import_stage()
methods, however, will always include some amount of code, as they will always be used.
To simplify things, the gather stage is used to create a list of datasets that will be created or updated in the final import stage. That's really all it's for. It is not meant for parsing content into dictionaries for creating or updating datasets (that occurs in the import stage). It also isn't meant for acquiring or storing raw content that will be parsed later (that occurs in the fetch stage)—with certain exceptions, like OpenSearch services, where the content is already provided in the initial search results.
The gather_stage()
method returns a list
of harvest object IDs, which the harvester will use for the next two stages. The IDs are generated by creating harvest objects for each dataset that should be created or updated. If the necessary content is already provided, it can be stored in the harvest object's .content
attribute as a str
. You can also create harvest object extras--ad hoc harvest object attributes--to store information like the status of the dataset (e.g., new or change), or to keep track of other information about the harvest object or the dataset that will be created/updated. However, the harvest object extras are not intended to store things like the key/value pairs that will later be used to create the package dictionary for creating/updating the dataset. 1) The gather stage is not the time to perform such parsing and 2) since the raw content can be saved in the .content
attribute, it is easier to just skip the intermediate step and create the package dictionary in the import stage.
The gather stage may proceed quickly because it does not require querying the source for each individual dataset. The goal is not to aquire the content in this stage—just to get a list of the datasets for which content is required. If individual source queries are necessary, they will be performed in the fetch stage.
During the gather stage, the gather_stage()
method will be called once.
During the fetch stage, the fetch_stage()
method will be called for each harvest object/dataset in the list created during the gather stage.
The purpose of the fetch stage is to get the content necessary for creating or updating the dataset in the import stage. The raw content can be stored as a str
in the harvest object's .content
attribute.
As in the gather stage, the harvest object extras should only be used to store information about the harvest object.
The fetch stage is the time to make individual queries to the source. If that's not necessary (e.g., the source is an OpenSearch service), then fetch_stage()
should just return True
.
During the import stage, the import_stage()
method will be called for each harvest object/dataset in the list created during the gather stage except for those that raised exceptions during the fetch stage. In other words, the import_stage()
method is called for every harvest object/dataset that has .content
.
The purpose of the import stage is to parse the content and use it, as well as any additional context or information provided by the harvest object extras, to create or update a dataset.
See the OpenSearchExample harvester skeleton for an example of how to use the libraries in this repository to build an OpenSearch-based harvester. There are detailed comments in the code, which can be copied as the starting point of a new harvester. If your harvester will not use an OpenSearch source, you'll also need to modify the gather_stage
and possibly the fetch_stage
methods, but the import_stage
will remain the same.
The iTag "harvester (ITageEnricher
) is better described as a metaharvester. It uses the harvester infrastructure to add new tags and metadata to existing datasets. It is completely separate from the other harvesters, meaning: if you want to harvest Sentinel products, you'll use one of the Sentinel harvesters. If you want to enrich Sentinel datasets, you'll use an instance of ITagEnricher
. But you'll use them separately, and they won't interact with eachother at all.
During the gather stage, it queries the CKAN instance itself to get a list of existing datasets that 1) have the spatial
extra and 2) have not yet been updated by the ITageEnricher. Based on this list, it then creates harvest objects. This stage might be described as self-harvesting.
During the fetch stage, it queries an iTag instance using the coordinates from each dataset's spatial
extra and then stores the response from iTag as .content
, which will be used in the import stage. As long as iTag returns a valid response, the dataset moves on to the import stage—in other words, all that matters is that the query succeeded, not whether the iTag was able to find tags for a particular footprint. See below for an explanation.
During the import stage, it parses the iTag response to extract any additional tags and/or metadata. Regardless of whether any additional tags or metadata are found, the extra itag: tagged
will be added to the dataset. This extra is used in the gather stage to filter out datasets for which successful iTag queries have been made.
To set it up, create a new harvester source (we'll call ours "iTag Enricher" for the sake of example). Select manual
for the update frequency. Select an organization (currently required—the metaharvester will only act on datasets that belong to that organization).
There are three configuration options:
base_url
: (required, string) determines the base URL to use when querying your iTag instance.timeout
: (integer, defaults to 5) determines the number of seconds before a request times out.datasets_per_job
: (integer, defaults to 10) determines the maximum number of datasets per job.Once you've created the harvester source, create the cron job below, using the name or ID of the source you just created:
* * * * * paster --plugin=ckanext-harvest harvester job {name or id of harvest source} -c {path to CKAN config}
The cron job will continually attempt to create a new harvest job. If there already is a running job for the source, the attempt will simply fail (this is the intended behaviour). If there is no running job, then a new job will be created, which will then be run by the harvester run
cron job that you should already have set up. The metaharvester will then make a list of all the datasets that should be enriched with iTag, but which have not yet been enriched, and then try to enrich them.
If a query to iTag fails, 1) it will be reported in the error report for the respective job and 2) the metaharvester will automatically try to enrich that dataset the next time it runs. No additional logs or tracking are required--as long as a dataset hasn't been tagged, and should be tagged, it will be added to the list each time a job is created. Once a dataset has been tagged (or it has been determined that there are no tags that can be added to it), it will no longer appear on the list of datasets that should be tagged.
Currently, ITagEnricher only creates a list of max. 1,000 datasets for each job. This limit is intended to speed up the rate at which jobs are completed (and feedback on performance is available). Since a new job will be created as soon as the current one is marked Finished
, this behaviour does not slow down the pace of tagging.
Sentinel-3 datasets have complex polygons that seem to cause iTag to timeout more often than it does when processesing requests related to other datasets, so Sentinel-3 datasets are currently filtered out of the list of datasets that need to be tagged.
In general, requests to iTag seem to timeout rather often, so it may be necessary to experiment with rate limiting. It may also be necessary to set up a more robust infrastructure for the iTag instance.
All harvesters should have tests that actually run the harvester, from start to finish, more than once. Such tests verify that the harvester will work as intended in production. The requests_mock
library allows us to easily mock the content returned by real requests to real URLs, so we can save the XML returned by OpenSearch interfaces, etc. and re-use it when testing. We can then write tests that verify 1) that the harvester starts, runs, finishes, and runs again (e.g., there are no errors that cause it to hang), 2) that it behaves as expected (e.g., it only updates datasets when a specific flag is set, or it restarts from a specific date following a failed request), and 3) that the datasets it creates or updates have exactly the metadata that we want them to have.
See TestESAHarvester().test_harvester()
for an example of how to run a harvester in a testing environment with mocked requests that return real XML.
The test itself needs to be refined. Some of the blocks should be helper functions or fixtures. But the method itself contains all the necessary components of full test of harvester functionality: create a harvester with a given config, run it to completion under different conditions, and verify that the results are as expected.
The same structure can be used for our other harvesters (with different mocked requests, of course, and with different expected results).
Using the same structure, we can also add tests that verify that the metadata of the datasets that are created also match the expected/intended results.
* * * * * paster --plugin=ckanext-harvest harvester run -c /srv/app/production.ini >> /var/log/cron.log 2>&1
* * * * * paster --plugin=ckanext-harvest harvester job itag-sentinel -c /srv/app/production.ini >> /var/log/cron.log 2>&1
* * * * * paster --plugin=ckanext-harvest harvester job code-de-sentinel -c /srv/app/production.ini >> /var/log/cron.log 2>&1
* * * * * paster --plugin=ckanext-harvest harvester job noa-sentinel -c /srv/app/production.ini >> /var/log/cron.log 2>&1
* * * * * paster --plugin=ckanext-harvest harvester job scihub-sentinel -c /srv/app/production.ini >> /var/log/cron.log 2>&1
Both the ESA harvester and the iTag metadata harvester can optionally log the status codes and response times of the sources or services that they query. If you want to log the response times and status codes of requests to harvest sources and/or your iTag service, you must include ckanext.nextgeossharvest.provider_log_dir=/path/to/your/logs
in your .ini
file. The log entries will look like this: INFO | esa_scihub | 2018-03-08 14:17:04.474262 | 200 | 2.885231s
(the second field will always be 12 characters and will be padded if necessary).
The data provider log file is called dataproviders_info.log
. The iTag service provider log is called itag_uptime.log