VForWaTer / metacatalog

Modular metadata management platform for environmental data.
https://vforwater.github.io/metacatalog
GNU General Public License v3.0
4 stars 1 forks source link

(details) key maximum letter restriction #121

Closed AlexDo1 closed 3 years ago

AlexDo1 commented 3 years ago

https://github.com/VForWaTer/metacatalog/blob/ba048b697a0e4e0a5b72bad7f9b31bc5e1116aea/metacatalog/models/details.py#L36-L38

Is there a reason why the maximum number of letters for the key in the details is limited to 20?

I am in the process of uploading the LUBW gauge data and some column names ('AUFGABE_QUELLMESSETZ' and 'EINZUGSGEBIET_OBERIRDISCH') have more letters than the maximum. I could of course think about shorter names, but it might be better to keep the original column names in case someone from LUBW wants to work with the data.

mmaelicke commented 3 years ago

No, there is no real reason. My intention was to keep them short and they are pruned to the word stem. (e.g. depth and depths would result in the same stem). I see two possibilites here:

  1. we just increase the limit and see how things are going.
  2. we keep the sutff as is and come up with short, generalizable names, like 'catchment' for 'EINZUGSGEBIET_OBERIRDISCH' and introduced a new column called alias or something.
  3. We go for 2. but use the existing description field instead of an alias. This would be my prefered option, but is only possible, if you can make up a good key for everything. Could you check that, please?
AlexDo1 commented 3 years ago

(3.) sounds like a good idea. Here are some suggestions for possible keys:

LUBW column name key
STANDORT site
GEWAESSER water_body
MESSNETZ measuring_network
PEGELTYP gauge_type
NUTZUNG usage
ENTFERNUNG_MUENDUNG distance_to_mouth
DATEN_VORH_SEIT data_before_since
BETRIEB_VON_DATUM operation_since
STATUS status
AUFGABE_KLEINE_EZG
AUFGABE_KLEINE_EZG
AUFGABE_KLIWA
BL_DATENAUSTAUSCH
AUFGABE_QUELLMESSETZ
EINZUGSGEBIET_OBERIRDISCH catchment_area

I don´t really have ideas for the missing keys, as these columns are very specific to the dataset.

mmaelicke commented 3 years ago
use: LUBW column name key
STANDORT site
GEWAESSER waterbody
MESSNETZ network
PEGELTYP type
NUTZUNG usage
ENTFERNUNG_MUENDUNG distance
DATEN_VORH_SEIT data_since
BETRIEB_VON_DATUM since
STATUS status
AUFGABE_KLEINE_EZG  
AUFGABE_KLEINE_EZG  
AUFGABE_KLIWA  
BL_DATENAUSTAUSCH  
AUFGABE_QUELLMESSETZ  
EINZUGSGEBIET_OBERIRDISCH catchment

and drop everything that does not have a key right now

mmaelicke commented 3 years ago

ah, and when you upload the LUBW column name as descriptions, do use them lowercase and capitalized. We don't need them to be all uppercase

AlexDo1 commented 3 years ago

I missed the columns 'PEGELDIAGRAMM_SEIT' and 'AUFGABE_FLIWAS'.

I will drop the column 'AUFGABE_FLIWAS' and use the key 'diagram_since' for the column 'PEGELDIAGRAMM_SEIT' if you don´t have a better idea.

mmaelicke commented 3 years ago

I think you can drop both.

AlexDo1 commented 3 years ago

You are right, 'PEGELDIAGRAM_SEIT' doesn´t really contain any useful information.

mmaelicke commented 3 years ago

@AlexDo1, this is finished and can be closed, right?