Open HankHerr-NOAA opened 3 months ago
For local, NWC access to data, evaluation declarations, and output, see the directory issue131828
in the standard location.
Hank
My test was run using revision 20240627-b58855f-dev in a repo with the remote just changed to GitHub.
Hank
( The underlying reason is ingest of one, continuous timeseries vs. a very large number of very small (one-event) time-series, but there is a question beneath that concerning why this difference in topology makes such a big difference to ingest time - there is some kind of ingest contention, probably related to source locking, but TBD. )
This relates to VLab User Support ticket #131828. The unsorted and sorted CSV files have been uploaded here:
https://drive.google.com/drive/folders/1-mBAjDUNf9COiw0dzly7mJ2aQg2BDSFD
Using a standalone pointing to a database and running on the NWC ised-dev1 machine, it took 1h 6m to complete the evaluation using unsorted data (where time series are written by time first, and then feature). Using the sorted data (where time series are written by feature, first, and then time), the evaluation took 2m 21s. Both evaluations were run on a freshly cleaned database. The declaration using the sorted data is below; just modify the predicted source accordingly.
Why such a stark difference? If it points to a code change to make, this ticket can be resolved once that change is made. Otherwise, this ticket can be resolved once we understand the underlying cause and decide that no change is needed.
Thanks,
Hank
=====================================