DAS-RCN / RCN_DASformat

4 stars 1 forks source link

General Naming Conventions and Typing #3

Open miili opened 1 year ago

miili commented 1 year ago

Hello all,

first off, thank you for this first draft.

I know, I know, Fortran once had a limitation of 6 chars for any variable. Please consider using clear self-explaining variables:

DASFileVersion  -> version
domain          -> data_unit
t0              -> start_time
dt              -> sampling_period
GL              -> gauge_length
lats            -> latitudes
longs           -> longitudes
elev            -> elevations
meta            -> additional_data
miili commented 1 year ago

Please use confined dataclasses (https://docs.python.org/3/library/dataclasses.html) for the meta data. Here is a proposal:

from dataclasses import dataclass
from typing import Any, Literal

StrainUnit = Literal["m/m", "cm/m", "nm/m"]

@dataclass
class DASMeta:
    version: int
    data_unit: StrainUnit
    start_time: float
    sampling_period: float
    gauge_length: float
    latitudes: list[float]
    longitudes: list[float]
    elevations: list[float]
    additional_data: dict[str, Any]

    def endtime(self, nsamples) -> float:
        ...
andreas-wuestefeld commented 1 year ago

I chose readability over efficiency. In my experience, your suggestion of classes increase the barrier of entry. For many student this might be their first contact with programming.

I envision this as reference reader, not optimum super-duper high-class reader. It should help people understand the data format.

But I am open for arguments against such approach

andreas-wuestefeld commented 1 year ago

regarding variable names, I am just lazy typing :-) I understand the argument for descriptive names

Let's see what the community thinks

jpmorten-asn commented 1 year ago

My preference is definitely on writing out the variable names using underscores to include spaces. This can avoid a lot of misunderstandings and makes it possible to discover the structure of the data even when documentation is not available (lost, or forgotten). I think one aim of the project was indeed to create a discoverable format.

miili commented 1 year ago

In my experience, your suggestion of classes increase the barrier of entry. For many student this might be their first contact with programming.

@andreas-wuestefeld, if we are looking for a sustainable DAS data format we need an elaborate concept. Conceptualization of a data format is nothing for students or beginner programmers. We need performant I/O (layout) and efficient storage (compression).

I envision this as reference reader, not optimum super-duper high-class reader.

A sustainable data format should be super-duper efficient and versatile!

It should help people understand the data format.

A user does not need to understand a data format. All its complexity has to be abstracted away by a reference library. This is why e.g. ObsPy (libmseed) is so successful, libjpeg or libhdf5.

The fundamental question is whether we are looking for a serious DAS data format implemented by IRIS which can be used for

  1. performant data analysis,
  2. efficient archiving,
  3. possibly streaming and
  4. querying online repositories (similar to FDSNWS)

    or a HDF5 structure for project-internal exchange in February.

andreas-wuestefeld commented 1 year ago

@miili I learned yesterday evening, in response to publishing this format, that IRIS is actually working on / considering a format It may well be that this format is rather short-lived, although I hope it will prove its worth.

I thus changed the potentially misleading name from IRIS (as part of the IRIS RCN efforts) to more general miniDAS. The repro name is still the same but will be hopefully fixed over the weekend.

At this point, I feel it is most important to have a common format for the global month, ideal or not. Your input is very good, and I am happy to hear these comments from someone obviously more familiar with the deep down programming features.

Maybe you can just point out the most easy-to-fix issues here to be implemented (space vs time for example?). variable names can obviously also be adjusted

andreas-wuestefeld commented 1 year ago

implemented. comments on new names are welcome

dcbowden commented 1 year ago

I'm a month late to these discussions. I agree with @andreas-wuestefeld that the formal object-oriented structure is going to be a bit harder for many of us academics to deal with; I also had to wrap my head around how to work with it. That said I agree with @miili that it could be OK, in that most academics & students don't need to worry about the internals! We just need some very user friendly demos before February. Maybe Jupyter Notebooks? Not just a README list of headers and function inputs/outputs, but a full step-by-step guide showing how to load some interrogator's raw output (Silixa, Febus, whatever), declare the metadata object, use the _fromnumpy() function to eventually save the proper output, etc.