CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
291 stars 115 forks source link

Flood data for Colombia, Nigeria, Sudan and Venezuela has an array of float 1.0 for .date attribute #850

Open IanHopkinson opened 4 months ago

IanHopkinson commented 4 months ago

Flood hazard data for Colombia, Nigeria, Sudan and Venezuela have an array of float 1.0 for .date attribute which cannot be parsed as a date.

To replicate:

from climada.util.api_client import Client
client = Client()
flood = client.get_hazard("flood", properties={
                            "country_name": "Colombia",
                        })
flood.date

Produces the result:

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1.])

By contrast the same code for Haiti produces the result:

array([731529, 733948, 733021, 732238, 732977, 735649, 731839, 732826,
       736580, 733439, 734601], dtype=int64)
peanutfun commented 4 months ago

@IanHopkinson Thanks for reporting this. I can confirm that the .date attribute of the flood data retrieved for Colombia, Nigeria and Sudan is all floating-point ones. However, using your code I cannot retrieve data for Venezuela (NoResult error).

As immediate solution, you can try casting the dates to ints, this way at least you should not run into value or data type errors. Of course, this will not give you more sensible data, but at least all Climada operations should run smoothly:

flood.date = flood.date.astype("int")

@emanuel-schmid Do you see a way of updating the datasets and adding the correct date information?

IanHopkinson commented 4 months ago

@peanutfun - thanks! Currently I catch the exception, which is specific to my code, to allow operations to continue. For Venezuela I retrieve the data using the iso3alpha code:

 flood = client.get_hazard("flood", properties={
                            "country_iso3alpha": "VEN",
                        })

To give you some idea of where these issues are coming from, I'm uploading data to the Humanitarian Data Exchange for the Humanitarian Response Plan countries listed here: https://github.com/OCHA-DAP/hdx-scraper-climada/blob/main/src/hdx_scraper_climada/metadata/countries.csv

I'm working my way through exposures and hazards so far I've done litpop, crop_production, earthquakes, floods and I'm going to do River_flood, Tropical_cyclone, Relative_cropyield

emanuel-schmid commented 4 months ago

@emanuel-schmid Do you see a way of updating the datasets and adding the correct date information?

Sure, I'm gonna give it a try - but I can't right away tell until when it's done.

peanutfun commented 4 months ago

@emanuel-schmid Great to hear, thank you! I was mostly wondering if the data is available at all.

peanutfun commented 4 months ago

@IanHopkinson Please bear in mind that these datasets are provided on a best-effort basis and with no guarantees on correctness and completeness whatsoever. We see them as "demonstrator" datasets for a Climada application and recommend users to use their own data for specialized applications as much as possible. See the disclaimer on the website of the API service here: https://climada.ethz.ch/disclaimer/ In the data types section, you will also find more detailed information on the datasets.

emanuel-schmid commented 4 months ago

I was mostly wondering if the data is available at all.

That is indeed a very good question. 🤔

IanHopkinson commented 4 months ago

@peanutfun - no problem - that is understood!

Evelyn-M commented 4 months ago

@IanHopkinson There's a two-part answer to this:

  1. General If you upload data to the HDX, please do not use these files, but rather the original ones from The Global Flood Database, available at https://global-flood-database.cloudtostreet.ai/ to avoid data being copied infinite times and important meta-data (like source, purpose, methods) getting lost. What we did is collect these files, which are event-based, but across several countries at times, and re-grouped them to country-wise files (covering various events, instead). However, for random users this will not be clear where they come from, how they have been post-processed, etc.

  2. Specific If some of the dates are missing, you can use this file (attached), which collects metadata of the original cloudtostreet files: via the id column in the csv (in the hdf5 file, this should be event_id), you can match it with the provided date of the csv. Most files should be correctly updated, but it can happen that some metadata got lost. flood_metainfo.csv

IanHopkinson commented 4 months ago

Thanks @Evelyn-M - that should fix my issue, also the link to the original source is very useful since I was checking the cloudstreet.ai website and it was re-directing to floodbase.com

peanutfun commented 2 weeks ago

@Evelyn-M @emanuel-schmid Do you intend to update the dates in the dataset on the API according to the "metainfo" file you provided? If not, I will close this issue

emanuel-schmid commented 2 weeks ago

@peanutfun: yes, eventually. But it's note yet clear when. 🤷

peanutfun commented 2 weeks ago

No worries, will leave it open then ✌️