Open james-d-brown opened 4 hours ago
From a Redmine user support ticket, #135539
.
To reproduce, use this declaration:
label: HEFS_Benchmark Streamflow
observed:
label: OBS Streamflow
sources: /data/BONM8_QME.xml
variable: QME
feature_authority: nws lid
type: observations
predicted:
label: HEFS_Benchmark Streamflow
sources: /data/200101011200_BONM8_RAW_QINE.xml
variable: QINE
type: ensemble forecasts
baseline:
label: HEFS_Raw Streamflow
sources: /data/200101011200_BONM8_RAW_SQIN.xml
variable: SQIN
type: ensemble forecasts
separate_metrics: true
features:
- {observed: BONM8, predicted: BONM8, baseline: BONM8}
unit: cms
lead_times:
minimum: 18
maximum: 720
unit: hours
lead_time_pools:
period: 24
frequency: 24
unit: hours
time_scale:
function: mean
period: 24
unit: hours
pair_frequency:
period: 24
unit: hours
#cross_pair: exact
probability_thresholds: [0.01,0.1,0.5,0.9,0.95,0.99,0.995,0.999]
minimum_sample_size: 30
season:
minimum_day: 1
minimum_month: 1
maximum_day: 31
maximum_month: 12
metrics:
- box plot of errors by forecast value
- continuous ranked probability score
- relative operating characteristic score
- brier score
- pearson correlation coefficient
- root mean square error
- mean error
- bias fraction
- reliability diagram
- relative operating characteristic diagram
- ensemble quantile quantile diagram
- brier skill score
- name: quantile quantile diagram
thresholds: all data
- continuous ranked probability skill score
- sample size
- mean absolute error
- box plot of errors by observed value
- name: rank histogram
probability_thresholds:
values: [0.01,0.1,0.5,0.9,0.95,0.99,0.995,0.999]
apply_to: predicted
duration_format: hours
decimal_format: '#0.000'
output_formats:
- csv2
- format: png
orientation: lead threshold
Together with the attached datasets (unpack first):
Witness a failure to create pairs for any baseline dataset.
The underlying queries for baseline
time-series look like this:
SELECT
metadata.series_id AS series_id,
metadata.reference_time + INTERVAL '1' MINUTE * TSV.lead AS valid_time,
metadata.reference_time,
metadata.reference_time_type,
ARRAY_AGG(TSV.series_value ORDER BY TS.ensemble_id) AS ensemble_members,
ARRAY_AGG(TS.ensemble_id ORDER BY TS.ensemble_id) AS ensemble_ids,
metadata.measurementunit_id,
metadata.scale_period,
metadata.scale_function,
metadata.feature_id,
metadata.occurrences
FROM
(
SELECT
S.source_id AS series_id,
MAX( reference_time ) AS reference_time,
TSRT.reference_time_type,
S.feature_id,
S.measurementunit_id,
TimeScale.duration_ms AS scale_period,
TimeScale.function_name AS scale_function,
COUNT(*) AS occurrences
FROM wres.Source S
INNER JOIN wres.ProjectSource PS
ON PS.source_id = S.source_id
INNER JOIN wres.TimeSeriesReferenceTime TSRT
ON TSRT.source_id = S.source_id
LEFT JOIN wres.TimeScale TimeScale
ON TimeScale.timescale_id = S.timescale_id
WHERE PS.project_id = 43
AND S.variable_name = 'QINE'
AND S.feature_id = 1556
AND PS.member = 'baseline'
GROUP BY S.source_id,
S.measurementunit_id,
TimeScale.duration_ms,
TimeScale.function_name,
TSRT.reference_time_type
) AS metadata
INNER JOIN wres.TimeSeries TS
ON TS.source_id = metadata.series_id
INNER JOIN wres.TimeSeriesValue TSV
ON TSV.timeseries_id = TS.timeseries_id
WHERE TSV.lead > 39960
AND TSV.lead <= 42840
AND ( EXTRACT( MONTH FROM metadata.reference_time ) > 1 OR ( EXTRACT( MONTH FROM metadata.reference_time ) = 1 AND EXTRACT( DAY FROM metadata.reference_time ) >= 1 ) )
AND ( EXTRACT( MONTH FROM metadata.reference_time ) < 12 OR ( EXTRACT( MONTH FROM metadata.reference_time ) = 12 AND EXTRACT( DAY FROM metadata.reference_time ) <= 31 ) )
GROUP BY metadata.series_id,metadata.reference_time, metadata.reference_time_type, metadata.feature_id, TSV.lead, metadata.scale_period, metadata.scale_function, metadata.measurementunit_id,metadata.occurrences
Crucially:
AND S.variable_name = 'QINE'
Contrary to the declaration:
baseline:
label: HEFS_Raw Streamflow
sources: /data/200101011200_BONM8_RAW_SQIN.xml
variable: SQIN
Given an evaluation that contains a different
variable
name
for each of thepredicted
andbaseline
datasets When the retrieval queries are formed Then I expect the appropriate name to be used for each dataset and not the same name for both