Open peanutfun opened 2 weeks ago
why don't we just force event names to be strings? If asstr()
fails there must be other methods, e.g.,
event_name = [str(x) for x in file["event_name"][:]]
@emanuel-schmid There was no decision on what to do in #807, so I came up with the least intrusive solution (from my perspective).
I am fine with your proposal, but it would imply that an Impact
object with non-string event_name
attribute is not cleanly "cycled": After writing to H5 and reading it again, the event_name
will be a list of strings.
Yeah, right. Sorry for coming late to the discussion - but... Imho 1) a name ought to be a string 2) ambiguous data types are a constant source of sorrow. Sooner or later someone will run into an issue because one of their hazards has event_name = 1, 2, 3 and another one '1', '2', '3' If it's only about clean cycles (for not so clean data): I estimate the cost of losing "cyclability" lower than having to deal with this for ever.
@emanuel-schmid Thanks a lot for that perspective! This is exactly the kind of perspective I was hoping for in the discussion of #807. I would be fine with going the "string-only" route. However, there are some inconsistencies we then have to fix. In the tests for Impact, there are two instances where Impacts are instantiated with non-string event names:
@chahank What do you think?
Thanks for the discussion @peanutfun and @emanuel-schmid : I here would also support your suggestion of the string only
route. I think this makes things much clearer.
@peanutfun Yes, Sorry again. 😳 thanks for pointing out the inconsistent tests. With regard to fixing #807 they have most likely no impact, but they are sending contradictory signals and we should eventually put every instance of an event_name within quotes. However - I think we can follow this up by fixing these when we come across them.
@emanuel-schmid @chahank When we go string-only, we still have to clarify: Do we (try to) convert any event name data to strings when writing, when reading, or both times?
Suggestion: When writing we simply assume they're strings, when reading we convert if necessary. (When writing it's kind of too late to do anything about it.)
(When writing it's kind of too late to do anything about it.)
Why? We can also call str
on the elements before/while writing.
Sure we can. But what would be the point? By then we supposedly have already been working daredevil on an "irregular object". We should have converted the non string event names to strings long ago. If we didn't, the conversion to strings just for the writing cannot save us as it doesn't make a difference anymore. Or does it?
I think it's mainly a question of conventions. Currently, Impact
works fine with non-string event names. As a user, I would find it confusing if I can store such an object without issues and then, when reading it again, I might receive a warning or even an error that a string conversion is taking place or does not work. I think this should pop up as issue already during writing. It can simply be a warning like: "You are writing non-string event names, this is dangerous!".
Hold on - when writing, the proper thing to do is fail if it is not a string, isn't it?
Because if we call str
on the elements before writing the outcome may be something like 'x.y.Z object at 0x7f457d8faeb0>' which is likely not the desired one.
@emanuel-schmid Great catch! Then we
Maybe we need an Issue to make sure that the event name is always a string in all hazard methods?
We would even need it for Impact
, because this PR only fixes the HDF5 read/write methods. Elsewhere, the tests explicitly check if mixed-type event names work, see https://github.com/CLIMADA-project/climada_python/pull/894#issuecomment-2173307313
Changes proposed in this PR:
str
on values if decoding strings fromevent_name
fails, issue a warning in this case.event_name
is not strings.event_name
was checked.This PR fixes #807
PR Author Checklist
develop
)PR Reviewer Checklist