eurodatacube / eodash-catalog

MIT License
1 stars 0 forks source link

Special external data in need of re-evaluation for STAC representation #30

Closed santilland closed 8 months ago

santilland commented 1 year ago

Currently some datasets are fetched, parsed and transformed when updating the data. We would need to consider if this task can be integrated into the geoDB workflow or if other possibilities should be considered. The datasets in question are:

lubojr commented 1 year ago

Agreed to keep all of the currently used indicators. The currently only updates ones are Covid data (daily vaccinations) and oilx data (to be updated for at least 1 more years), the other two are not updated anymore.

The idea is to convert the static indicators to CSVs and convert to GEODB tables. @lubojr to provide more info here in the issue on the expected format (columns) in GEODB for all 4 indicators.

lubojr commented 1 year ago

@AlessandroScremin @dmoglioni These are the output files and formats that we are generating for these special datasets and would need to be migrated to GeoDB if possible.

OILX data - actually a standard eodash data format:

https://github.com/eurodatacube/eodash/blob/staging/app/public/eodash-data/internal/100011-OX.json

Information about POI itself is in https://raw.githubusercontent.com/eurodatacube/eodash/staging/app/public/data/internal/pois_eodash.json

https://github.com/eurodatacube/eodash/blob/staging/app/public/eodash-data/internal/AE-GG.json https://github.com/eurodatacube/eodash/blob/staging/app/public/eodash-data/internal/AE-CV.json https://github.com/eurodatacube/eodash/blob/staging/app/public/eodash-data/internal/AE-OW.json

These are per-country entries that get created by the scripts listed in the original issue description.

Essentially all files with suffix -OW, -CV or -GG are of interest to ingest data from.

Regarding information about POI itself, these get generated by the https://github.com/eurodatacube/eodash/blob/staging/app/src/scripts/create_capitals.py but do not change over time (do not get updated)

they are saved to https://raw.githubusercontent.com/eurodatacube/eodash/staging/app/public/data/internal/pois_trilateral.json - search for with "indicator": "GG", "indicator": "CV", "indicator": "OW"

lubojr commented 11 months ago

@dmoglioni

We have evaluated the current structure of these indicators GG,OW,CV and we would actually for the migration suggest to change the data structure to the standard eodash geodb table format as all other indicators are using where a single row is a single measurement (and whatever other values we put in the "referenceValue" array) for a certain time.

So coming back to what it means for planned migrating of these 4 datasets to GEODB:

dmoglioni commented 11 months ago

@lubojr

I went through the material provided and would like to align with you on some aspects. Let's have a call to iterate on this faster. Thank you.

dmoglioni commented 11 months ago

@lubojr

As agreed during our call, I'll proceed with the integration on our CI/CD workflow as in the following:

lubojr commented 11 months ago

@dmoglioni Regarding the european aggregated OILX Data EU1-OX, lets move the data into a new table - for example OX-EU (and let's make the indicator code change from OX to OX-EU for this one) so there will be one table for all POIs for OX and another table for just the european level OX-EU

dmoglioni commented 11 months ago

@lubojr what geometry information should be attached to OX-EU? Here an example of the .json obtained for OX-EU. EU1-OX.json

lubojr commented 11 months ago

@dmoglioni For eodash the geometry does not matter, we do not use the geometry field. You can use a single point geometry at the location of the AOI. For the AOI lets use for example coordinates of Munich 48.13,11.57 (arbitrary location I chose now) - The subAoi can also be kept empty,

dmoglioni commented 11 months ago

@lubojr I also wanted to do something like this, assigning an arbitrary AoI for the geometry field but I wanted to be sure you were not using geometry information for visualization purposes.

dmoglioni commented 11 months ago

@lubojr I'm proceeding with OX-EU integration and this is the information available for it:

Should we add something about (for example):

lubojr commented 11 months ago

@dmoglioni yes these three sound fine. Nothing else needed to be added. Thank you.

dmoglioni commented 11 months ago

@lubojr

OX/OX-EU indicators are now operational and their data can be fetched for the dashboard from the collections Crude_Oil_Storage_Index and Crude_Oil_Storage_Index-Europe respectively.

lubojr commented 11 months ago

Hi @dmoglioni Please update the aoi_id value in table Crude_Oil_Storage_Index-Europe to something else than "/" - even though this table has only a single AOI, it is not possible that it is blank. Please set it to for example Europe. Thank you

dmoglioni commented 11 months ago

Hi @lubojr I added 'EU' as AOI_ID for Crude_Oil_Storage_Index-Europe as requested.

dmoglioni commented 11 months ago

About Google mobility data I found that the geometry information (extracted from pois_eodash.json) for this timeseries is available for only 35 country out of the 135 required.

In particular, these are the countries listed in the timeseries (135): 'AE', 'AF', 'AG', 'AO', 'AR', 'AT', 'AU', 'AW', 'BA', 'BB', 'BD', 'BE', 'BF', 'BG', 'BH', 'BJ', 'BO', 'BR', 'BS', 'BW', 'BY', 'BZ', 'CA', 'CH', 'CI', 'CL', 'CM', 'CO', 'CR', 'CV', 'CZ', 'DE', 'DK', 'DO', 'EC', 'EE', 'EG', 'ES', 'FI', 'FJ', 'FR', 'GA', 'GB', 'GE', 'GH', 'GR', 'GT', 'GW', 'HK', 'HN', 'HR', 'HT', 'HU', 'ID', 'IE', 'IL', 'IN', 'IQ', 'IT', 'JM', 'JO', 'JP', 'KE', 'KG', 'KH', 'KR', 'KW', 'KZ', 'LA', 'LB', 'LI', 'LK', 'LT', 'LU', 'LV', 'LY', 'MA', 'MD', 'MK', 'ML', 'MM', 'MN', 'MT', 'MU', 'MX', 'MY', 'MZ', 'NA', 'NE', 'NG', 'NI', 'NL', 'NO', 'NP', 'NZ', 'OM', 'PA', 'PE', 'PG', 'PH', ' PK', 'PL', 'PR', 'PT', 'PY', 'QA', 'RE', 'RO', 'RS', 'RU', 'RW', 'SA', 'SE', 'SG', 'SI', 'SK', 'SN', 'SV', 'TG', 'TH', 'TJ', 'TR', 'TT', 'TW', 'TZ', 'UA', 'UG', 'US', 'UY', 'VE', 'VN', 'YE', 'ZA', 'ZM', 'ZW']

whereas those are the ones available (35) out of the pois file: 'AT', '48.2,16.366667'), ('BA', '43.87,18.42'), ('BE', '50.83333333,4.3333330000000005'), ('BG', '42.68333333,23.316667000000002'), ('CH', '47.451542,8.564572'), ('CZ', '50.08333333,14.466667000000001'), ('DE', '52.51666667,13.4'), ('DK','55.66666667,12.583333'), ('EE', '59.43333333,24.716667'), ('EG', '30.939554,32.314923'), ('ES', '40.416775,-3.70379'), ('FI', '60.16666667,24.933332999999998'), ('FR', '48.864715999999994,2.349014'), ('GB', '52.48,1.89'), ('GR', '37.98333333,23.733333'), ('HR', '45.8,16.0'), ('HU', '47.5,19.083333'), ('IE', '53.31666667,-6.233333'), ('IT', '41.902782,12.496366'), ('LT','54.68333333,25.316667000000002'), ('LU', '49.6,6.116667'), ('LV', '56.95,24.1'), ('MK', '42,21.43'), ('MT', '35.88333333,14.5'), ('NL', '52.35,4.9166669999999995'), ('NO', '60.197552,11.100415'), ('PL', '52.25,21.0'), ('PT', '38.71666667,-9.133333'), ('RO', '44.43333333,26.1'), ('RS', '44.83,20.5'), ('RU', '54.729095, 19.823546'), ('SE', '59.33333333,18.05'), ('SI', '46.05,14.516667000 000002'), ('SK', '48.15,17.116667'), ('TR', '40.982555,28.820829').

Attached for completeness the pois file I'm using. Is it the right file or is there any other source you were getting this geometry information from? Thank you.

pois_eodash.json

lubojr commented 11 months ago

Hi @dmoglioni this is a bit tricky and I will leave it up to you to decide what you prefer. This indicator is used for both race and trilateral dashboards, while for race, the list of countries that you listed is correct, but for the trilateral, they are present in the pois_trilateral.json where a much larger subset of countries is used.

It does not matter for the race dashboard if you duplicate the data in GeoDB (split the collections) or leave it as a single containing all of them (from pois_trilateral.json), we already have the code to subset part of the collection for race, while having another subset for trilateral.

dmoglioni commented 11 months ago

Hi @lubojr following up on our today's call on mobility data (GG), I identified 9 countries that are not present in pois_trilateral.json:

['AG', 'AW', 'BB', 'BH', 'CV', 'HK', 'LI', 'MU', 'RE']

Could you please check it? Thank you

lubojr commented 11 months ago

@dmoglioni thank you for checking the data. These 9 we are not including in any dashboard (due to missing cross reference in pois_trilateral.json - due to the fact that we did not have a subaoi.

Let's skip them completely during the migration.

dmoglioni commented 11 months ago

@lubojr thank you for the clarification.

dmoglioni commented 11 months ago

@lubojr Mobility data collection created on geoDB (Mobility_data) and data ingested. As agreed the 'Measurement Value' column contains 'grocery' whereas 'Reference value' column contains ['retail_recreation', 'parks', 'transit_stations', 'workplaces', 'residential'].

Let me know if everything is displayed correctly when fetching the data, thank you.

lubojr commented 11 months ago

@dmoglioni I have checked the data in table "Mobility_data" and the same issue as with OILX Europe is present. None of the rows have got a aoi_id filled, which we rely upon. Could you please fill them to match the corresponding country column? (TR -> TR).

lubojr commented 11 months ago

The OILX Data now work correctly after minor fixed in the eodash client.

dmoglioni commented 11 months ago

@lubojr About Mobility Data: of course I can add that information, it was just not clear to me from our call or your previous comment that the AOI_ID is in general a mandatory field on your side

dmoglioni commented 11 months ago

aoi_id added to Mobility Data.

lubojr commented 11 months ago

@dmoglioni there are still some rows where aoi_id is a /.

indicator_data = geodb.get_collection("Mobility_data",database="eodash")
indicator_data["aoi_id"].value_counts()['/']
# 974

Could you please double check?

dmoglioni commented 11 months ago

@lubojr thank you for pointing out the issue. Now the data should be correctly ingested in the geoDB.

lubojr commented 11 months ago

@dmoglioni Thank you for the update, I confirm that the Mobility_data indicator can now be fetched from GeoDB and STAC Catalog is now created. I had a look at the integration and due to the fact that we now use the column "City" on the Map Icon hover, currently it shows for example GR instead of Greece. image

I think it would anyway make sense on our side (client or generator) to make the column to use for name configurable but these data are currently missing in the table.

In the original data in pois_trilateral.json it was following way:

"aoiID": "BE",
"city": "Belgium",
"country": "BE",

Which does not make sense looking at it backwards. Now it is following way:

"country": "AE"
"city": "/"
"aoi_id": "AE"

In order for the map POI to have a correct label, could you please change the table so that the "Country" column has the original "city" value from JSON ("Belgium")? The city can then remain / and I shall adapt the generator for this collection so that the "country" column is used for the id.

dmoglioni commented 11 months ago

@lubojr Interestingly enough, I was noting the same thing before you wrote me and wanted to update the Country attribute with its name instead of the ID. I'll follow the same approach also for CV and OW.

dmoglioni commented 11 months ago

@lubojr Mobility data updated as agreed, hope everything is in line now

lubojr commented 10 months ago

@dmoglioni We have updated the config to use the country column if city is blank or /. image

dmoglioni commented 9 months ago

@santilland As you might be aware of, unfortunately the last integration of the two remaining indicators (CV and OW) is blocked due to the lack of access to our processing environment in EDC.

lubojr commented 9 months ago

On the side note after the infrastructure problems get fixed. Previously, I have not realized that we (in eodash client) can not have - character in the indicator code. Sorry for complications while proposing the original OX-EU indicator for the global OILX index.

@dmoglioni Could you please update your script to change the indicator code for all current rows (and future updates) inside the table Crude_Oil_Storage_Index-Europe from OX-EU to OX_EU?

dmoglioni commented 9 months ago

@lubojr the script has been updated following your additional requirement; the changes on the geodb will be effective starting from next Monday (Jan, 29th).

lubojr commented 8 months ago

@dmoglioni Hello, I just wanted to briefly check what is the current status of the covid data (OW and CV) ingestion to geodb?

dmoglioni commented 8 months ago

@lubojr Hi, thank you for the reminder, I'll let you know as it's completed.

dmoglioni commented 8 months ago

@lubojr CV indicator is now operational on the geoDB under the data collection 'Global_COVID_data'; could you please check if you can fetch all the information correctly?

lubojr commented 8 months ago

@dmoglioni Latitude and longitude were switched during the import against other past collections. Could you please fix that and remove the space in between them?

"aoi": "61.210817, 35.650072" should be "aoi": "35.650072,61.210817"

dmoglioni commented 8 months ago

@lubojr should also the ones for subAOI be flipped?

lubojr commented 8 months ago

SubAOI is fine as is.

dmoglioni commented 8 months ago

perfect, reingestion completed

lubojr commented 8 months ago

@dmoglioni The coordinates for individual countries are usually on/near the borders instead of in the capitals of the countries as was in the original dataset and as is for mobility data. Could you please double check?

Old: https://race.esa.int/?indicator=CV&x=2177956.42178&y=6562096.62978&z=5.24697 New: https://eodash-testing.eox.at/ui-panels-cat/?catalog=cv-geodb-integrate&indicator=CV

dmoglioni commented 8 months ago

@lubo can we have a short call about it to speed things up? Just tell me your availabilities and I'll send an outlook, thx

dmoglioni commented 8 months ago

@lubojr By comparing the two jsons - countries.json (with only subAOI coordinates) and pois_trilateral.json (with both AOI and subAOI coordinates) with respect to the AOIs present in the COVID data, I found out that the countries.json actually contains three more countries, namely SS (South Sudan), XK (Kosovo) and AQ (Antarctica). Hence I'll extract the AOI coordinates info from poi_trilateral.json and the subAOI from countries.json. For SS and XK countries I'll add the AOI coordinates of the corresponding capitals, for AQ the first coordinate of the subAOI polygon.

dmoglioni commented 8 months ago

@lubojr CV data reprocessed and reingested, could you check it please, thx

lubojr commented 8 months ago

@dmoglioni Thank you for the update. Almost, but not there yet. The aoi_id must be set for all rows (we use it as unique identifier) and it can not be /. Currently for Namibia, there is aoi_id == '/'.

dmoglioni commented 8 months ago

@lubojr AOI_ID for Namibia fixed, now everything should be in line.

lubojr commented 8 months ago

Perfect. It works fine. Thank you!

dmoglioni commented 8 months ago

@lubojr OW indicator is now operational on the geoDB under the data collection 'Global_COVID_vaccination_data'; could you please check if you can fetch all the information correctly?

lubojr commented 8 months ago

@dmoglioni I have checked the data and our previous integration showed the total_vaccinations, people_fully_vaccinated and daily_vaccinations - see https://github.com/eurodatacube/eodash-catalog/issues/30#issuecomment-1812232297. The data in GeoDB have just measurement_value (daily vaccinations). Could you please add the total_vaccinations and people_fully_vaccinated into reference_value array? Thanks.

dmoglioni commented 8 months ago

@lubojr I updated the collection with the specified reference values