fermilab-accelerator-ai / workflow

Machinery to pull data, wrangle data, keep it all running
5 stars 3 forks source link

Need to define formatting code #1

Open gnperdue opened 4 years ago

gnperdue commented 4 years ago

We pull down data in one set of formats, and need to re-format them for training and testing purposes. We should clearly define theses transformer codes.

Longer-term, we need to be sure that the low-latency data streams are logged in a compact fashion. These may also require transformer codes (may not be associated with the workflow repo).

jasonstjohn commented 4 years ago

I anticipate changing the data format we initially store. We could go directly for something which might be used "as-is" for training/testing/etc or, if that also seems likely to change, some in-between format which is readily converted. If the latter approach is adopted, modularizing our data flow has its advantages, and I would be happy to work on the second phase in data processing.

gnperdue commented 4 years ago

This issue may be subsumed by https://github.com/fermilab-accelerator-ai/meetings/issues/31 - if we do move to a SQL server long term, then of course we need to define the correct tables, etc.