NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I want to understand why subsequent project runs with similar data are taking longer and longer to run #216

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: Phillip (Phillip) Original Redmine Issue: 66665, https://vlab.noaa.gov/redmine/issues/66665 Original Date: 2019-07-31


I have recently run 4 jobs back-to-back:

  1. https://nwcal-wres-ti.[host]/job/4361467293612663970
  2. https://nwcal-wres-ti.[host]/job/6320452621731906666
  3. https://nwcal-wres-ti.[host]/job/6918783164857396540
  4. https://nwcal-wres-ti.[host]/job/-4870459844157050816

The time for each job to complete is increasing with subsequent executions:

  1. PT1H8M19.122288S
  2. PT2H31M31.035546S
  3. PT3H35M20.584506S
  4. PT4H42M30.705348S

I was expecting each job to run in a similar amount of time as the forecast/observed data is similar in size/shape but is for 4 separate locations. The first 3 jobs use the exact same project configuration. The 4th job used a different project configuration however it should effectively be the same project configuration as the first 3 jobs. It just points to a different folder for the right source for labeling purposes. Those configurations are here for reference:

First 3 jobs configuration

<?xml version='1.0' encoding='UTF-8'?>
<project name="esp flavor eval - 5 day">
  <inputs>
    <left label="RQIN Daily CSV KCFS">
      <type>observations</type>
      <source format="csv" has_header="true">file:///mnt/wres_share/users/nwrfc/esp-flavor-eval/left</source>
      <variable unit="KCFS">RQIN</variable>
    </left>
    <right label="Operational ESP 5-day CSV KCFS">
      <type>ensemble forecasts</type>
      <source format="csv" has_header="true">file:///mnt/wres_share/users/nwrfc/esp-flavor-eval/esp_5day</source>
      <variable unit="KCFS">SQIN</variable>
    </right>
  </inputs>
  <pair label="esp 5-day KCFS">
    <unit>KCFS</unit>
    <leadHours maximum="8760"/>
    <desiredTimeScale>
      <function>mean</function>
      <period>5</period>
      <unit>days</unit>
    </desiredTimeScale>
  </pair>
  <metrics>
    <thresholds>
      <type>probability</type>
      <applyTo>left</applyTo>
      <commaSeparatedValues>0.05,0.1,0.25,0.50,0.75,0.90,0.95</commaSeparatedValues>
      <operator>greater than</operator>
    </thresholds>
    <metric>
      <name>continuous ranked probability score</name>
    </metric>
    <metric>
      <name>brier score</name>
    </metric>
    <metric>
      <name>mean error</name>
    </metric>
    <metric>
      <name>root mean square error</name>
    </metric>
    <metric>
      <name>pearson correlation coefficient</name>
    </metric>
    <metric>
      <name>mean square error skill score</name>
    </metric>
    <metric>
      <name>sample size</name>
    </metric>
  </metrics>
  <outputs>
    <destination type="numeric">
      <outputType>default</outputType>
    </destination>
  </outputs>
</project>
</code>

Last jobs configuration:

<?xml version='1.0' encoding='UTF-8'?>
<project name="esp flavor eval - 10 day">
  <inputs>
    <left label="RQIN Daily CSV KCFS">
      <type>observations</type>
      <source format="csv" has_header="true">file:///mnt/wres_share/users/nwrfc/esp-flavor-eval/left</source>
      <variable unit="KCFS">RQIN</variable>
    </left>
    <right label="Operational ESP 10-day CSV KCFS">
      <type>ensemble forecasts</type>
      <source format="csv" has_header="true">file:///mnt/wres_share/users/nwrfc/esp-flavor-eval/esp_10day</source>
      <variable unit="KCFS">SQIN</variable>
    </right>
  </inputs>
  <pair label="esp 5-day KCFS">
    <unit>KCFS</unit>
    <leadHours maximum="8760"/>
    <desiredTimeScale>
      <function>mean</function>
      <period>5</period>
      <unit>days</unit>
    </desiredTimeScale>
  </pair>
  <metrics>
    <thresholds>
      <type>probability</type>
      <applyTo>left</applyTo>
      <commaSeparatedValues>0.05,0.1,0.25,0.50,0.75,0.90,0.95</commaSeparatedValues>
      <operator>greater than</operator>
    </thresholds>
    <metric>
      <name>continuous ranked probability score</name>
    </metric>
    <metric>
      <name>brier score</name>
    </metric>
    <metric>
      <name>mean error</name>
    </metric>
    <metric>
      <name>root mean square error</name>
    </metric>
    <metric>
      <name>pearson correlation coefficient</name>
    </metric>
    <metric>
      <name>mean square error skill score</name>
    </metric>
    <metric>
      <name>sample size</name>
    </metric>
  </metrics>
  <outputs>
    <destination type="numeric">
      <outputType>default</outputType>
    </destination>
  </outputs>
</project>
</code>

The data I used for the 4 jobs is available here:

job1

left: /mnt/wres_share/users/nwrfc/esp-flavor-eval/obs_csv/LYDM8_wres.csv right: /mnt/wres_share/users/nwrfc/esp-flavor-eval/esp_csv_tgzs/LYDM8_esp5_operational.tgz

job2

left:/mnt/wres_share/users/nwrfc/esp-flavor-eval/obs_csv/JLKW4_wres.csv right: /mnt/wres_share/users/nwrfc/esp-flavor-eval/esp_csv_tgzs/JLKW4_esp5_operational.tgz

job3

left: /mnt/wres_share/users/nwrfc/esp-flavor-eval/obs_csv/HHDW1_wres.csv right: /mnt/wres_share/users/nwrfc/esp-flavor-eval/esp_csv_tgzs/HHDW1_esp5_operational.tgz

job4

left: /mnt/wres_share/users/nwrfc/esp-flavor-eval/obs_csv/HHDW1_wres.csv right: /mnt/wres_share/users/nwrfc/esp-flavor-eval/esp_csv_tgzs/HHDW1_esp10_operational.tgz

Thanks!

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-07-31T16:29:10Z


Hi Phil,

Thanks for continuing to report issues - you are being a tremendous help!

I agree with you that subsequent runs of the same declaration should be faster and not slower, other factors being equal. The reason is that, once the sources are ingested, they are re-used in subsequent evaluations that reference the same sources.

One possible explanation, though, is that "other factors being equal" is not true. For example, it is possible that your subsequent runs were taking place when there was additional system loading. It also seems to me that your right source is different each time, based on what you have listed above(?) If true, then ingest will need to happen for each one. It is also possible that the database is growing in size (because each execution is ingesting more data), and the size of the database can be a factor in the evaluation time (but that is something we would look to mitigate).

Anyway, we will need to look into it for you. But I can confirm, at least, that your understanding is broadly correct, namely that, other factors being equal, multiple executions of the same evaluation should take roughly the same time when they have the same ingest status and, when comparing an equivalent run that requires ingest to one that does not, the latter should be quicker.

James

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T16:37:20Z


James,

If you look at the bottom of Phil's description, the config may have been the same, but he was swapping out the data between runs. Thus, the runs were not really identical, but used different data. I think that would explain the increasing slowness, though I would not expect such large increases in run time.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-07-31T16:40:48Z


Hank wrote:

James,

If you look at the bottom of Phil's description, the config may have been the same, but he was swapping out the data between runs. Thus, the runs were not really identical, but used different data. I think that would explain the increasing slowness, though I would not expect such large increases in run time.

Hank

Yup, I noted that too. It looks to me as though the right data is different each time. So that is one violation of the "all factors are equal". There is the related one of a growing database. There may be other ones too. Still, I think Phil is reporting something that looks suspicious, anecdotally, so we should look into that.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T16:49:49Z


Agreed. I think our goal was always a sub-linear growth in run-time when data is added to the underlying database, but Phil appears to be reporting something close to linear: double the data, double the run time. Needs investigating, as you said, and I think it falls within the realm of other recent conversations we've had.

Hank

(EDIT: fixed language; I was typing too fast.)

epag commented 3 weeks ago

Original Redmine Comment Author Name: Phillip (Phillip) Original Date: 2019-07-31T16:52:58Z


Hey James,

It also seems to me that your right source is different each time, based on what you have listed above(?) If true, then ingest will need to happen for each one. It is also possible that the database is growing in size (because each execution is ingesting more data), and the size of the database can be a factor in the evaluation time (but that is something we would look to mitigate).

You are correct that the right source has data for a new location each run. Ahh! So if I understand correctly the new location's data is being ingested into the same database as my previous runs because I'm using the same right source? I had the incorrect assumption that each job was completely independent from previous jobs.

So if the additional data is making the database queries slower I'm a bit confused why my fourth job would take so long. The fourth job used a differently configured right source from the first 3 jobs so does that mean it should have stored the right data in a separate database from my first three jobs?

epag commented 3 weeks ago

Original Redmine Comment Author Name: Phillip (Phillip) Original Date: 2019-07-31T16:54:53Z


Hank wrote:

James,

If you look at the bottom of Phil's description, the config may have been the same, but he was swapping out the data between runs. Thus, the runs were not really identical, but used different data. I think that would explain the increasing slowness, though I would not expect such large increases in run time.

Hank

This is correct :)

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T17:02:16Z


Phil,

All of your runs ingested the data into the same database. This should allow for repeated runs using the same data (e.g., to look at other metrics) to run much more quickly. However, as you add data, the runs will slow, I just wouldn't expect it to slow as much as you've reported.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Phillip (Phillip) Original Date: 2019-07-31T17:17:43Z


Got it. A few more questions to further my understanding. I notice that when I add additional forecast locations, the evaluation seems to know to only return results for the newly added location. What happens in the following situations?

Say I've added observed/forecast data for four locations for an initial evaluation. I come back later and do the following:

  1. I realize one of the locations was missing a week of forecast data. I upload the missing forecast data for that location and rerun the project. Do I get back updated results for the amended location?
  2. Some of the later leadtimes of forecasts in my original evaluation didn't yet have observed data to pair with. It's been some time since the initial evaluation and as such additional observed data is available to pair with those forecasts. I upload the additional observed data for all 4 locations. Do I get back updated results for all four locations?

Thanks!

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T17:23:43Z


Phil:

  1. I realize one of the locations was missing a week of forecast data. I upload the missing forecast data for that location and rerun the project. Do I get back updated results for the amended location?

Yes. If the @right@ source path points to a directory that includes +all+ of your evaluation data beneath it (previous data + just uploaded forecast data), then you will get results that make use of +all+ of your evaluation data. However, it will detect that it only needs to ingest the just uploaded forecast data and recognize that the rest of the data has already been ingested.

  1. Some of the later leadtimes of forecasts in my original evaluation didn't yet have observed data to pair with. It's been some time since the initial evaluation and as such additional observed data is available to pair with those forecasts. I upload the additional observed data for all 4 locations. Do I get back updated results for all four locations?

Yes. When you provide it a new @left@ observed data file, WRES will see that this is a file it has not ingested yet and will ingest that data. It will then do an evaluation using your @right@ data (previously ingested) and the updated @left@ observation data.

What I described above is our intention. If you believe either is not the case, it would constitute a bug that should be reported.

Let me know if you have any other questions. Thanks,

Hank

EDIT: Forgot to include 'Yes' in answer to question 1, so I added it.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-07-31T17:27:14Z


Comments below:

Phillip wrote:

Got it. A few more questions to further my understanding. I notice that when I add additional forecast locations, the evaluation seems to know to only return results for the newly added location. What happens in the following situations?

Say I've added observed/forecast data for four locations for an initial evaluation. I come back later and do the following:

  1. I realize one of the locations was missing a week of forecast data. I upload the missing forecast data for that location and rerun the project. Do I get back updated results for the amended location?

Yes, assuming the source you declare includes the additional data. It might not if you are referencing individual files as right sources. If you are referencing a directory, and that directory contains new data, it will see that some data has not been ingested before and it will ingest that data and use it along with the earlier data. That is how it should work anyway.

  1. Some of the later leadtimes of forecasts in my original evaluation didn't yet have observed data to pair with. It's been some time since the initial evaluation and as such additional observed data is available to pair with those forecasts. I upload the additional observed data for all 4 locations. Do I get back updated results for all four locations?

Same here. If the declared source includes the updates made, then the new data should be ingested. If you edited a file, that would be included as a completely new source. If you added another left source to a directory, then that new left source would be ingested and used alongside the other sources, ingested previously.

Thanks!

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T17:31:31Z


I guess two answers are better than one. :)

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-07-31T17:32:45Z


Hank wrote:

I guess two answers are better than one. :)

Hank

Yeah, I spotted yours when I hit submit, so I thought... whatever, I hope it matches. Looks like it did :)

epag commented 3 weeks ago

Original Redmine Comment Author Name: Phillip (Phillip) Original Date: 2019-07-31T18:03:24Z


Hank wrote:

I guess two answers are better than one. :)

Hank

Thanks for both responses :P

So in my original description of running 4 jobs with new location data in the data directories each time..I understand WRES is ingesting the data into the same database and it only ingests data it detects it doesn't already have.

I observed that each job returned results for only the new location added. Is WRES computing stats for only the newly added location or is it computing stats for the new location and all locations previously ingested?

I guess I'm thinking that the performance I observed is actually MUCH worse than just a linear increase. Job 4 took nearly 5 hours but returned results for only 1 location. If WRES only calculated stats for the new location this increased processing time seems unexpected. If its perhaps calculating stats for all 4 locations but returning results for only the new location then the work spent on the other 3 locations seems like wasted compute.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-07-31T18:19:05Z


Phillip wrote:

Hank wrote:

I guess two answers are better than one. :)

Hank

Thanks for both responses :P

So in my original description of running 4 jobs with new location data in the data directories each time..I understand WRES is ingesting the data into the same database and it only ingests data it detects it doesn't already have.

I observed that each job returned results for only the new location added. Is WRES computing stats for only the newly added location or is it computing stats for the new location and all locations previously ingested?

I guess I'm thinking that the performance I observed is actually MUCH worse than just a linear increase. Job 4 took nearly 5 hours but returned results for only 1 location. If WRES only calculated stats for the new location this increased processing time seems unexpected. If its perhaps calculating stats for all 4 locations but returning results for only the new location then the work spent on the other 3 locations seems like wasted compute.

Phil,

It looks like your declaration does not ask for any particular features, so that means "evaluate all features that I can find". So that's good, no constraint there.

The "all features I can find" will depend what is contained in the left and right sources that you have declared.

If you declared directories (looks like you did) and the directories contain multiple features and you are adding data for new features then, yes, I would expect the WRES to emit statistics for an increasing number of feature as you add features.

However, if you are declaring a directory in the right source that does not contain all of the features, then the scope of that declaration will be whatever it can find in the right source.

Bottom line, each evaluation is completely self-contained and works only on what the declaration for that evaluation says. There is no formal connection between any two or more evaluations (beyond what we do under the hood to optimize, such as not re-ingesting sources if another evaluation ingested them). So when you run one evaluation, the next one is completely independent of any other evaluations you may have run. However, insofar as you are adding data to a directory and you are referencing that directory, then the new data (e.g. new locations) should be processed too.

It sounds like they are not being processed, so it sounds to me like there is a problem. Anyway, someone will need to take a look for you in due course. I don't have access to NWC IT systems, I'm afraid, so I can only give you hot air at present, but hopefully it is helping a little... ;)

James

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2019-07-31T18:26:18Z


This:

each evaluation is completely self-contained and works only on what the declaration for that evaluation says

Each evaluation can only see data that is declared for that evaluation.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2019-07-31T18:32:22Z


Phillip wrote:

I guess I'm thinking that the performance I observed is actually MUCH worse than just a linear increase.

That depends on what the x axis represents. What Hank mentioned above is a "size of the database at beginning of evaluation" x axis and "duration of a similar evaluation" y axis. From that angle, it does look roughly linear. We want it to be better-than-linear or close-to-constant. But we have plenty to do before we can get there, thanks for your patience. For now the performance will be erratic. We'll probably clean all data out of the database every now and then to bring the times down. I hope we can eventually hit a closer-to-constant state.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T18:43:46Z


Because I like to pile on...

As implied by James and Jesse, job4 is only computing metrics for the @left@ and @right@ sources in job4. So, you get results for HHDW1 and esp10, only.

However, its ingesting completely new forecasts into the database. AND its selecting data from a database with a linear increase in the number forecasts from job1 to job2 to job3 to job4 (4x as much for job4 as job1). So either the ingest is resulting in the run taking 4 times as long (more or less), the selection is, or both are. We won't know for sure until we dig a little deeper.

WRES folks: If you think this merits a ticket on the development side, let me know. Also, I'll start pulling over the data to the WRES Test VM to support replicating the issue once we are ready to investigate.

Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-07-31T18:46:41Z


Hank,

I think a little more digging is needed here first to determine whether there are issues (correctness, slowness) and what those issues are, so that we can open -one- zero or more dev tickets for them.

James

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T18:48:26Z


Okay. I'll grab the data, to allow for reproducing. I think Jesse also knows how to navigate the log files on the WRES Test Platform to get more information from his runs.

Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T18:49:42Z


The data can be found on the WRES Test VM (-ti01) in /wres_share/testing/userdata/user.support/ticket66665. James: If you want a copy shared on Google drive, let me know.

Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-07-31T18:54:23Z


Hank wrote:

The data can be found on the WRES Test VM (-ti01) in /wres_share/testing/userdata/user.support/ticket66665. James: If you want a copy shared on Google drive, let me know.

Thanks,

Hank

How big is it? If it's, say, 1-2 GiB, then sure, just go ahead and upload (thanks). Otherwise, I'll wait to see if we need dev tickets and can then retrieve and help with those.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-07-31T18:57:09Z


6+ GBs. I'll wait to put them on Google drive until needed, then.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Phillip (Phillip) Original Date: 2019-07-31T19:01:02Z


James wrote:

It sounds like they are not being processed, so it sounds to me like there is a problem. Anyway, someone will need to take a look for you in due course. I don't have access to NWC IT systems, I'm afraid, so I can only give you hot air at present, but hopefully it is helping a little... ;)

James

James, the explanations from you all are definitely helpful! I really appreciate the quick and thorough responses.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2019-07-31T19:14:28Z


Execution log from db attached.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-11-21T11:36:51Z


This is probably worth another look once v1.20 is deployed. These runs have a similar shape to 9xx for which the retrieval portion, at least, should now be much faster on a database of a given size (because the majority of the retrieval was spent on offset evaluation in v1.19 and earlier, which disappears in v1.20).

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2020-01-08T21:51:21Z


It would be good to look at again, but I doubt there will be any change in the curve: it is likely the growth of @wres.Observations@ table without hitting any partition boundary (there are no partitions on @wres.Observations@).