NCAR / kcor-pipeline

Pipeline code for KCor
Other
3 stars 2 forks source link

Create and populate CME alert database #306

Open mgalloy opened 2 years ago

mgalloy commented 2 years ago

Create a database of the alerts from the automated detection pipeline, observers, and event log that can be used to generate statistics on accuracy and timeliness of our real-time system.

This requires:

The definition of the CME table would be:

create table kcor_cme (
  cme_id                int(10) auto_increment primary key,
  obs_day               mediumint (5) not null,
  dt_created            timestamp default current_timestamp
)

The alerts table would be:

create table kcor_cme_alert (
  cme_alert_id          int(10) auto_increment primary key,
  obs_day               mediumint (5) not null,
  dt_created            timestamp default current_timestamp,

  cme_id                int(10),

  alert_type            enum('initial', 'observer', 'retraction', 'summary', 'analyst'),
  event_type            text,
  cme_type              enum('possible cme', 'cme', 'jet', 'epl', 'outflow'),
  retracted             boolean,

  issue_time            datetime,  -- analyst alerts from event log don't have issue time
  last_data_time        datetime,
  start_time            datetime not null,
  end_time              datetime,
  in_progress           boolean,

  position_angle        float,
  speed                 float,
  height                float,
  time_for_height       datetime,

  -- for observer and retraction alerts only
  comment               text,
  confidence_level      enum('low', 'medium', 'high'),

  -- for summary reports only
  time_history          blob,
  pa_history            blob,
  speed_history         blob,
  height_history        blob,

  kcor_sw_id            int(10),

  foreign key (cme_id) references kcor_cme(cme_id),
  foreign key (kcor_sw_id) references kcor_sw(sw_id),
  foreign key (obs_day) references mlso_numfiles(day_id)
)

Features table:

create table kcor_cme_features (
  cme_feature_id        int(10) auto_increment primary key,

  cme_alert_id          int(10),
  dt_created            timestamp default current_timestamp,

  indices               blob,

  -- from a fit for the feature
  feature_speed         blob,
  feature_acceleration  blob,
  feature_width         blob,

  final_height          float,
  final_speed           float,
  final_accelation      float,

  foreign key (cme_alert_id) references kcor_cme_alert(cme_alert_id)
)
mgalloy commented 2 years ago

This looks to be part of a proposal, so don't work on this until it is ironed out.

mgalloy commented 1 year ago

Need to add start time and height to the feature table and mode, e.g., "nowcast" vs. "simulated_realtime_nowcast".

jburkepile commented 1 year ago

Mike: Thanks for adding the interim reports every 5 minutes. I suggest changing the filenames to include the word 'interim' or 'summary' to the filenames. It would also be good to add 'kcor' to the filenames: current names: yymmdd.hhmmss.cme.plot.csv and .png

new names: yymmdd.hhmmss.kcor.cme.interim.plot.csv and .png yymmdd.hhmmss.kcor.cme.summary.plot.csv and .png

Let me know if this is a pain to invoke. Also, there is no rush on this. Your comments / suggestions are welcome. thanks!

jburkepile commented 1 year ago

Please remove the word 'ending' from the subject line in emails sent with the interim reports:

MLSO K-Cor interim report for CME on yyyy-mm-dd ending at hh:mm:ss UT change to: MLSO K-Cor interim report for CME on yyyy-mm-dd at hh:mm:ss UT

Please keep the word 'ending' in the summary report subject line.

No rush on this. Thanks!

jburkepile commented 6 months ago

Mike, I would like the database to also ingest initial alerts and summary reports for older events run in simulation mode to determine how well the code performs on earlier (and noisier) data. That information, and the output from the alerts, is extremely useful. I think we need to add a new keyword. I don't see one in the current database but perhaps I missed it. How about this:

alert_timing enum('realtime', 'simulated') If you prefer a different name for the keyword that is fine.

We should talk about how this impacts the 'issue_time' keyword value and what other implications there are for historical simulated runs.