Closed peterdesmet closed 1 year ago
I'm thinking that h5_to_df()
should work on singular files (one file -> one df) and output a df. It should therefore not have csv
in the name.
The above pseudocode could be wrapped in a larger function create_vpts_csv(radar, start, end)
I think it's quite important to have in the code a clear separation in the code between the "transport part" (retrieval from S3) and transformation part (so the latter can be used in other contexts, for example with local files or different data repositories.
I agree, would that affect how you would structure the pseudo code?
No, I think at this stage it's easier/faster to do it in code and not worry too much about the pseudo-code.
the start
and end
parameters are dates, and source is either "baltrad" or "ecog-04003", correct?
Yes, correct.
closing this issue, tackled by https://github.com/enram/vptstools/pull/19
Pseudo code:
So:
a custom function
get_h5_files()
that understand the directory structure of the repo. It likely makes use of the s3 library under the hood to get a list of file paths that match a radar, start, end date, source criterium.a custom function
h5_to_df()
that reads a h5 file and converts it to VPTS CSV format, but as a dataframe, not a file. The function can be called many times to build a growing data frame.a generic
write_csv()
function (e.g. from pandas) that writes the df to a file at some location. The write_csv() settings should match those of the csv dialect defined for VPTS CSV