aodn / content

Tracks AODN Portal content and configuration issues
0 stars 0 forks source link

AODN WAVE NRT - investigation of slow harvester #513

Closed lbesnard closed 1 year ago

lbesnard commented 1 year ago

@bpasquer @ana-berger I'm trying to understand why the harvester is slow. While doing so, I investigated the content of aodn_wave_nrt_v2_timeseries_map.

The following query is just looking at the NRT map view for one file_id only. There is a total of 290 rows for one file !! I think this is probably the cause of slowing things down, potentially with the spatial extent component.

I think I understand the point of this, which is to show in the content ftl maybe data for the last 7 days, however we're mixing _map and _data in the same view. I don't think this is right. Also in geoserver, this means that for one file_id, 290 points are being plotted.

So far, on RC, the view contains 11k+ rows, so 11k+ need to be plotted.

select * from aodn_wave_nrt_v2_timeseries_map where file_id=71 order by file_id, "TIME"

|---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+-------------------------------------------+------------------------+----------------+----------------+---------------------+-------------+------------------------+------------------------+--------------+---------------+------------------------+-------------------------+------------------+----------->| 71      | Pilbara_Ports_Authority/WAVE-BUOYS/REALTIME/WAVE-PARAMETERS/DAMPIER-BEACON-10-EAST/2022/10/PPA_20221001T000000Z_DAMPIER-BEACON-10-EAST_RT_WAVE-PARAMETERS_monthly.nc | 258783.0 | Pilbara Ports Authority/OMC International | Dampier-Beacon 10 East | Datawell DWR-G | directional    | moored surface buoy | <null>      | 2022-08-31 14:07:00+00 | 2022-10-30 11:47:00+00 | -20.63511163 | 116.70855763  | 2022-10-24 06:27:00+00 | <null>                  | <null>           | 306.04    >| 71      | Pilbara_Ports_Authority/WAVE-BUOYS/REALTIME/WAVE-PARAMETERS/DAMPIER-BEACON-10-EAST/2022/10/PPA_20221001T000000Z_DAMPIER-BEACON-10-EAST_RT_WAVE-PARAMETERS_monthly.nc | 258783.0 | Pilbara Ports Authority/OMC International | Dampier-Beacon 10 East | Datawell DWR-G | directional    | moored surface buoy | <null>      | 2022-08-31 14:07:00+00 | 2022-10-30 11:47:00+00 | -20.63511163 | 116.70855763  | 2022-10-24 07:07:00+00 | <null>                  | <null>           | 310.84    >| 71      | Pilbara_Ports_Authority/WAVE-BUOYS/REALTIME/WAVE-PARAMETERS/DAMPIER-BEACON-10-EAST/2022/10/PPA_20221001T000000Z_DAMPIER-BEACON-10-EAST_RT_WAVE-PARAMETERS_monthly.nc | 258783.0 | Pilbara Ports Authority/OMC International | Dampier-Beacon 10 East | Datawell DWR-G | directional    | moored surface buoy | <null>      | 2022-08-31 14:07:00+00 | 2022-10-30 11:47:00+00 | -20.63511163 | 116.70855763  | 2022-10-24 07:27:00+00 | <null>                  | <null>           | 243.238   >| 71      | Pilbara_Ports_Authority/WAVE-BUOYS/REALTIME/WAVE-PARAMETERS/DAMPIER-BEACON-10-EAST/2022/10/PPA_20221001T000000Z_DAMPIER-BEACON-10-EAST_RT_WAVE-PARAMETERS_monthly.nc | 258783.0 | Pilbara Ports Authority/OMC International | Dampier-Beacon 10 East | Datawell DWR-G | directional    | moored surface buoy | <null>      | 2022-08-31 14:07:00+00 | 2022-10-30 11:47:00+00 | -20.63511163 | 116.70855763  | 2022-10-24 08:27:00+00 | <null>                  | <null>           | 258.8     >

....
SELECT 290
    CREATE VIEW aodn_wave_nrt_v2_timeseries_map AS      
  SELECT
  fm.file_id,
  f.url,
  f.size,
  sm.institution,
  fm.site_name,
  sm.instrument,
  sm.wave_buoy_type,
  sm.platform,  
  sm.water_depth,
  sm.time_start,
  sm.time_end,
  sm.latitude_min,
  sm.longitude_min,
  m."TIME",
  case 
    when sm.instrument like '%Datawell%' then m."WHTH"
    when sm.instrument like '%Spotter%' then m."WSSH"
    when sm.instrument like '%Triaxys%' then m."WHTH"
  end significant_wave_height,
  case
    when sm.instrument like '%Datawell%' then m."WPMH"
    when sm.instrument like '%Spotter%' then m."WPFM"
    when sm.instrument like '%Triaxys%' then m."WPMH"
  end wave_mean_period,
  m."WPDI" as peak_wave_direction,
  m."WPPE" as peak_wave_period,
  sm.geom
  FROM file_metadata fm
  JOIN indexed_file f ON f.id = fm.file_id
  JOIN site_metadata sm ON sm.site_name = fm.site_name
  JOIN measurements m on m.file_id = fm.file_id
  WHERE m."TIME" > (now() - '7 days'::interval) 
  AND fm.data_mode = 'realtime' AND sm.data_mode = 'realtime'
  ORDER BY m."TIME" DESC;

For now, I'm just modifying the harvester to disable the spatial extent and will see if there are any other causes for the harvester being so slow

lbesnard commented 1 year ago

https://github.com/aodn/harvesters/pull/982

lbesnard commented 1 year ago

by removing the spatial extent component, files now take a few seconds to be ingested, so the issue is indeed in the _map layer

lbesnard commented 1 year ago

Also, looking at the pipeline logs, I was suprised to see that some files such as UWA_20200101T000000Z_GOODRICH-BANK-OLD_RT_WAVE-PARAMETERS_monthly.nc were being created/ingested over and over, especially looking at the date of 2020/01/01.

The GOODRICH-BANK-OLD site has its metadata defined in https://github.com/aodn/data-services/blob/master/AODN/AODN-WAVE-NRT/ARDC_API_NRT/ardc_nrt/config/sofar/sources_id_metadata.json#L753 i.e. SPOT-0551. And as can be seen, no deployment_start_date value is set.

Now, I don't think the logs for the ardc nrt module got looked at all and it will be important to do so (manually or nagios/sumo logic)

cat /mnt/ebs/wip/ARDC_API_NRT/sofar_output/process.log
....
22826 2022-11-01 14:40:47,858 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0170
22827 2022-11-01 14:40:48,774 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0162
22828 2022-11-01 14:40:49,201 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0172
22829 2022-11-01 14:40:49,828 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0171
22830 2022-11-01 14:40:55,709 - ardc_nrt.lib.common.utils - ERROR - Processing SPOT-0171 aborted
22831 2022-11-01 14:40:55,709 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0168
22832 2022-11-01 14:40:56,346 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0093
22833 2022-11-01 14:40:59,717 - ardc_nrt.lib.common.utils - ERROR - Processing SPOT-0093 aborted
22834 2022-11-01 14:40:59,717 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0551
22835 2022-11-01 14:41:02,715 - ardc_nrt.lib.common.utils - ERROR - Processing SPOT-0551 aborted
22836 2022-11-01 14:41:02,715 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0558
22837 2022-11-01 14:41:03,329 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0559
22838 2022-11-01 14:41:03,994 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0561

We can see a few ERROR messages for a few spotters (0551,0093, 0168, 0171 ...). Now my bad, the message is not very helpful. There was a very small issue in the logging class, outputting most messages to the screen, but almost none (escape the ones above) in the process.log file. This PR fixes it.

The message which should have appeared in the log file is

2022-11-01 17:16:47,539 - __main__ - INFO - processing SPOT-0551
2022-11-01 17:16:47,544 - ardc_nrt.lib.sofar.api - INFO - SPOT-0551: API get latest date available: https://api.sofarocean.com/api/latest-data?spotterId=SPOT-0551&token=e0eb70b6d9e0b5e00450929139ea34
2022-11-01 17:16:48,550 - ardc_nrt.lib.sofar.api - INFO - SPOT-0551: API get latest date available is 2021-03-28 03:01:01+00:00
2022-11-01 17:16:48,550 - ardc_nrt.lib.common.pickle_db - INFO - Pickle file does not exist yet
2022-11-01 17:16:49,380 - ardc_nrt.lib.sofar.api - INFO - SPOT-0551: API get data: https://api.sofarocean.com/api/wave-data?spotterId=SPOT-0551&startDate=2020-10-01T00:00:00.000000Z&endDate=2020-11-01T00:00:00.000000Z&token=e0eb70b6d9e0b5e00450929139ea34
2022-11-01 17:16:49,381 - ardc_nrt.lib.sofar.api - WARNING - SPOT-0551: no data between 2020-10-01T00:00:00.000000Z -> 2020-11-01T00:00:00.000000Z
2022-11-01 17:16:49,381 - __main__ - ERROR - Processing SPOT-0551 aborted. No data available BETWEEN 2020-10-01 00:00:00+00:00 AND 2020-11-01 00:00:00+00:00

This is fixed here:

ana-berger commented 1 year ago

Also, looking at the pipeline logs, I was suprised to see that some files such as UWA_20200101T000000Z_GOODRICH-BANK-OLD_RT_WAVE-PARAMETERS_monthly.nc were being created/ingested over and over, especially looking at the date of 2020/01/01.

The GOODRICH-BANK-OLD site has its metadata defined in https://github.com/aodn/data-services/blob/master/AODN/AODN-WAVE-NRT/ARDC_API_NRT/ardc_nrt/config/sofar/sources_id_metadata.json#L753 i.e. SPOT-0551. And as can be seen, no deployment_start_date value is set.

Now, I don't think the logs for the ardc nrt module got looked at all and it will be important to do so (manually or nagios/sumo logic)

cat /mnt/ebs/wip/ARDC_API_NRT/sofar_output/process.log
....
22826 2022-11-01 14:40:47,858 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0170
22827 2022-11-01 14:40:48,774 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0162
22828 2022-11-01 14:40:49,201 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0172
22829 2022-11-01 14:40:49,828 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0171
22830 2022-11-01 14:40:55,709 - ardc_nrt.lib.common.utils - ERROR - Processing SPOT-0171 aborted
22831 2022-11-01 14:40:55,709 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0168
22832 2022-11-01 14:40:56,346 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0093
22833 2022-11-01 14:40:59,717 - ardc_nrt.lib.common.utils - ERROR - Processing SPOT-0093 aborted
22834 2022-11-01 14:40:59,717 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0551
22835 2022-11-01 14:41:02,715 - ardc_nrt.lib.common.utils - ERROR - Processing SPOT-0551 aborted
22836 2022-11-01 14:41:02,715 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0558
22837 2022-11-01 14:41:03,329 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0559
22838 2022-11-01 14:41:03,994 - ardc_nrt.lib.common.utils - INFO - processing SPOT-0561

We can see a few ERROR messages for a few spotters (0551,0093, 0168, 0171 ...). Now my bad, the message is not very helpful. There was a very small issue in the logging class, outputting most messages to the screen, but almost none (escape the ones above) in the process.log file. This PR fixes it.

The message which should have appeared in the log file is

2022-11-01 17:16:47,539 - __main__ - INFO - processing SPOT-0551
2022-11-01 17:16:47,544 - ardc_nrt.lib.sofar.api - INFO - SPOT-0551: API get latest date available: https://api.sofarocean.com/api/latest-data?spotterId=SPOT-0551&token=e0eb70b6d9e0b5e00450929139ea34
2022-11-01 17:16:48,550 - ardc_nrt.lib.sofar.api - INFO - SPOT-0551: API get latest date available is 2021-03-28 03:01:01+00:00
2022-11-01 17:16:48,550 - ardc_nrt.lib.common.pickle_db - INFO - Pickle file does not exist yet
2022-11-01 17:16:49,380 - ardc_nrt.lib.sofar.api - INFO - SPOT-0551: API get data: https://api.sofarocean.com/api/wave-data?spotterId=SPOT-0551&startDate=2020-10-01T00:00:00.000000Z&endDate=2020-11-01T00:00:00.000000Z&token=e0eb70b6d9e0b5e00450929139ea34
2022-11-01 17:16:49,381 - ardc_nrt.lib.sofar.api - WARNING - SPOT-0551: no data between 2020-10-01T00:00:00.000000Z -> 2020-11-01T00:00:00.000000Z
2022-11-01 17:16:49,381 - __main__ - ERROR - Processing SPOT-0551 aborted. No data available BETWEEN 2020-10-01 00:00:00+00:00 AND 2020-11-01 00:00:00+00:00

This is fixed here:

Some buoys are supposed to not have 'deployment_start_date' as we didn't get that metadata from the data provider. @bpasquer can confirm that but it means the buoys is not active at the moment.

lbesnard commented 1 year ago

I created some function to easily check the latest logs @bpasquer @ana-berger


function _ardc_log() {
    local ardc_log_dir=$WIP_DIR/ARDC_API_NRT
    # fname is not local to be re-used by other sub-functions

    if command -v fzf > /dev/null; then
        fname=$(find $ardc_log_dir -type f -name "process.log" | \
        fzf --preview "( [[ -f {} ]] && (tail -50 {} || cat {}))")
    else
        printf "Please select log file:\n"
        select fname in `find $ardc_log_dir -type f -name process.log`;
        do
          echo you picked $fname \($REPLY\)
          break;
        done
    fi
}
#export -f _ardc_log

function ardc_log_recent() {
    _ardc_log
    local hrago=$(date --date="1 hours ago" "+%Y-%m-%d %H")

    sed -n "/^$hrago/,\$p" $fname | perl -pe 's/^.*FATAL.*$/\e[1;37;41m$&\e[0m/g; s/^.*ERROR.*$/\e[1;31;40m$&\e[0m/g; s/^.*WARN.*$/\e[0;33;40m$&\e[0m/g; s/^.*INFO.*$/\e[0;36;40m$&\e[0m/g; s/^.*DEBUG.*$/\e[0;37;40m$&\e[0m/g'
    }
#export -f ardc_log_recent

function ardc_log_recent_errors() {
    _ardc_log
    local hrago=$(date --date="1 hours ago" "+%Y-%m-%d %H")

    sed -n "/^$hrago/,\$p" $fname | grep ERROR | perl -pe 's/^.*FATAL.*$/\e[1;37;41m$&\e[0m/g; s/^.*ERROR.*$/\e[1;31;40m$&\e[0m/g; s/^.*WARN.*$/\e[0;33;40m$&\e[0m/g; s/^.*INFO.*$/\e[0;36;40m$&\e[0m/g; s/^.*DEBUG.*$/\e[0;37;40m$&\e[0m/g '
    }
#export -f ardc_log_recent_errors
bpasquer commented 1 year ago

Because the _map view contains for each site wave data for the last 7 days it slows down the update of the spatial extent (some site have data up to 48 data per day). The UpdateSpatialExtent component of the harvester has been disabled and the performance issue resolved. The spatial extent of existing moored wave buoys is not expected to change. When new buoys are added, the spatial extent can be updated in the metadata record manually.