earthobservations / wetterdienst

Open weather data for humans.
https://wetterdienst.readthedocs.io/
MIT License
349 stars 54 forks source link

Using meaningful column/field names #54

Closed amotl closed 4 years ago

amotl commented 4 years ago

Dear Benjamin and Daniel,

for starting a discussion around assigning meaningful English names to meteorological short identifiers, I wanted to humbly point out eaaf936e coming from our PR #55.

We had similar things within the knowledgebase module knowledge.py of dwdweather2 and recognized you also already started to put efforts into the aspect of appropriately mapping DWD-specific original parameters, names and such to identifiers which are more suitable for human consumption.

So, we would like to ask you if you appreciate the approach to expand that very aspect on all field names available?

With kind regards, Andreas.

gutzbenj commented 4 years ago

Dear @amotl ,

I had mainly left those "data column names" untouched because I'd have a hard time choosing the right names for the values. There are those given names by DWD which are sometimes as long as a sentence which is also not really an option.

Furthermore when extending the column names, the data becomes hardly printable as for data which has several parameters, this would easily span over the whole screen, thus the dataframe is hardly readable anymore. I'd prefer an opt in solution. Putting the full name also wouldn't be a problem if we'd melt the DataFrame (could also be opt in).

What do you think?

amotl commented 4 years ago

Dear Benjamin,

I had mainly left those "data column names" untouched because I'd have a hard time choosing the right names for the values. There are those given names by DWD which are sometimes as long as a sentence which is also not really an option.

I see. Maybe @wetterfrosch and I can work on that aspect to establish reasonable names here.

[In any case,] I'd prefer an opt in solution.

Your wish is my command. I've added an option humanize_column_names to the collect_dwd_data method through an updated ebb922bb coming from #55.

May I also suggest to use lowercase column names on the matter of #55?

With kind regards, Andreas.

gutzbenj commented 4 years ago

Thank you, I merged the PR. Column names may also be lowercase as you wish.

amotl commented 4 years ago

Thank you, I merged the PR.

Thanks again!

Column names may also be lowercase as you wish.

Shall we make all things on the right hand side of column_names_enumeration.py lowercase then?

gutzbenj commented 4 years ago

Thank you, I merged the PR.

Thanks again!

Column names may also be lowercase as you wish.

Shall we make all things on the right hand side of column_names_enumeration.py lowercase then?

Yes, sure!

amotl commented 4 years ago

After completing the list for the daily resolution, we should expand the list of column names for the 10_minutes and hourly resolutions next, see https://github.com/panodata/dwdweather2/issues/13#issuecomment-653950576.

gutzbenj commented 4 years ago

I will try and setup up the names for the whole set of parameters. I already know that some values are not that clear, and with those I'll come back to you!

amotl commented 4 years ago

I will try and setup up the names for the whole set of parameters.

Thanks!

Borrowing from dwdweather2

You might want to get inspired by our naming scheme applied within the knowledgebase module knowledge.py of dwdweather2 to make the column names look and sound nice. Disclaimer: Saying this, there might be well room for improvement.

Examples: temperature_max_200, temperature_min_200, temperature_min_005, soil_temperature_002, soil_temperature_005, etc.

It would be cool to be able to give individual humanized column names to same columns per parameter, especially for the quality level columns like daily_quality_level_3 vs. daily_quality_level_4 vs. soil_temperature_quality_level in order to be able to merge data from different parameter sets into a single DataFrame without collisions. This relates to #106.

Borrowing from GribMagic

@meteoDaniel also added some humanized labels through unified_forecast_variables.py, thanks already! Wetterdienst might also draw some inspiration from this.

gutzbenj commented 4 years ago

Please check out the new branch https://github.com/earthobservations/wetterdienst/tree/more-meaningful-column-names

I added all column names and had also added a function to create an individual mapping based on some anomalies within "kl" parameter, that has two quality flags within one file.

I still had some issues with these parameters:

RWS_IND_10
RS_IND_01
DD
CD_TER
DK_TER
FK_TER
NSH_TAG
JA_RR
JA_MX_RS

Maybe we can discuss those.