Flow for converting h5 to VPTS CSV files

peterdesmet commented 2 years ago

Pseudo code:

h5_files = get_h5_files(radar, start, end, source) # returns list of paths
df = empty df
for h5_file in h5_files:
  df = h5_to_df(h5_file)
  append(df, df)
pandas:write_csv(df, "some/path/name.csv")

So:

a custom function get_h5_files() that understand the directory structure of the repo. It likely makes use of the s3 library under the hood to get a list of file paths that match a radar, start, end date, source criterium.
a custom function h5_to_df() that reads a h5 file and converts it to VPTS CSV format, but as a dataframe, not a file. The function can be called many times to build a growing data frame.
a generic write_csv() function (e.g. from pandas) that writes the df to a file at some location. The write_csv() settings should match those of the csv dialect defined for VPTS CSV

peterdesmet commented 2 years ago

I'm thinking that h5_to_df() should work on singular files (one file -> one df) and output a df. It should therefore not have csv in the name.

peterdesmet commented 2 years ago

The above pseudocode could be wrapped in a larger function create_vpts_csv(radar, start, end)

niconoe commented 2 years ago

I think it's quite important to have in the code a clear separation in the code between the "transport part" (retrieval from S3) and transformation part (so the latter can be used in other contexts, for example with local files or different data repositories.

peterdesmet commented 2 years ago

I agree, would that affect how you would structure the pseudo code?

niconoe commented 2 years ago

No, I think at this stage it's easier/faster to do it in code and not worry too much about the pseudo-code.

the start and end parameters are dates, and source is either "baltrad" or "ecog-04003", correct?

peterdesmet commented 2 years ago

Yes, correct.

stijnvanhoey commented 1 year ago

closing this issue, tackled by https://github.com/enram/vptstools/pull/19

enram / vptstools

Flow for converting h5 to VPTS CSV files #4