CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
291 stars 115 forks source link

Explicitly convert `event_name` to strings in `Impact.from_hdf5` #894

Open peanutfun opened 2 weeks ago

peanutfun commented 2 weeks ago

Changes proposed in this PR:

This PR fixes #807

PR Author Checklist

PR Reviewer Checklist

emanuel-schmid commented 2 weeks ago

why don't we just force event names to be strings? If asstr() fails there must be other methods, e.g.,

event_name = [str(x) for x in file["event_name"][:]]
peanutfun commented 2 weeks ago

@emanuel-schmid There was no decision on what to do in #807, so I came up with the least intrusive solution (from my perspective).

I am fine with your proposal, but it would imply that an Impact object with non-string event_name attribute is not cleanly "cycled": After writing to H5 and reading it again, the event_name will be a list of strings.

emanuel-schmid commented 2 weeks ago

Yeah, right. Sorry for coming late to the discussion - but... Imho 1) a name ought to be a string 2) ambiguous data types are a constant source of sorrow. Sooner or later someone will run into an issue because one of their hazards has event_name = 1, 2, 3 and another one '1', '2', '3' If it's only about clean cycles (for not so clean data): I estimate the cost of losing "cyclability" lower than having to deal with this for ever.

peanutfun commented 2 weeks ago

@emanuel-schmid Thanks a lot for that perspective! This is exactly the kind of perspective I was hoping for in the discussion of #807. I would be fine with going the "string-only" route. However, there are some inconsistencies we then have to fix. In the tests for Impact, there are two instances where Impacts are instantiated with non-string event names:

peanutfun commented 2 weeks ago

@chahank What do you think?

chahank commented 2 weeks ago

Thanks for the discussion @peanutfun and @emanuel-schmid : I here would also support your suggestion of the string only route. I think this makes things much clearer.

emanuel-schmid commented 2 weeks ago

@peanutfun Yes, Sorry again. 😳 thanks for pointing out the inconsistent tests. With regard to fixing #807 they have most likely no impact, but they are sending contradictory signals and we should eventually put every instance of an event_name within quotes. However - I think we can follow this up by fixing these when we come across them.

peanutfun commented 2 weeks ago

@emanuel-schmid @chahank When we go string-only, we still have to clarify: Do we (try to) convert any event name data to strings when writing, when reading, or both times?

emanuel-schmid commented 2 weeks ago

Suggestion: When writing we simply assume they're strings, when reading we convert if necessary. (When writing it's kind of too late to do anything about it.)

peanutfun commented 1 week ago

(When writing it's kind of too late to do anything about it.)

Why? We can also call str on the elements before/while writing.

emanuel-schmid commented 1 week ago

Sure we can. But what would be the point? By then we supposedly have already been working daredevil on an "irregular object". We should have converted the non string event names to strings long ago. If we didn't, the conversion to strings just for the writing cannot save us as it doesn't make a difference anymore. Or does it?

peanutfun commented 1 week ago

I think it's mainly a question of conventions. Currently, Impact works fine with non-string event names. As a user, I would find it confusing if I can store such an object without issues and then, when reading it again, I might receive a warning or even an error that a string conversion is taking place or does not work. I think this should pop up as issue already during writing. It can simply be a warning like: "You are writing non-string event names, this is dangerous!".

emanuel-schmid commented 4 days ago

Hold on - when writing, the proper thing to do is fail if it is not a string, isn't it?

Because if we call str on the elements before writing the outcome may be something like 'x.y.Z object at 0x7f457d8faeb0>' which is likely not the desired one.

peanutfun commented 4 days ago

@emanuel-schmid Great catch! Then we

  1. Throw an error when writing non-strings.
  2. Read strings or try to convert (in the latter case, issue a warning)
peanutfun commented 16 hours ago

Maybe we need an Issue to make sure that the event name is always a string in all hazard methods?

We would even need it for Impact, because this PR only fixes the HDF5 read/write methods. Elsewhere, the tests explicitly check if mixed-type event names work, see https://github.com/CLIMADA-project/climada_python/pull/894#issuecomment-2173307313