aodn / harvesters

Harvesters
GNU General Public License v3.0
0 stars 0 forks source link

AUV VIEWER TRACK harvester - Extremely slow Liquibase on NSP #125

Closed lbesnard closed 9 years ago

lbesnard commented 9 years ago

The AUV VIEWER TRACK harvester is extremely slow to go through the liquibase statements, even though there is no Liquibase changes.

On NSP14, for example the output of cat /mnt/ebs/talend/jobs/auv_viewer_track_rc-auv_viewer_track_rc/log/console.2014-11-23-19-20-41.log

shows that more than 3 hours is needed to do nothing ... and sometimes it can take 6 hours.

INFO 23/11/14 7:20 PM:liquibase: Successfully acquired change log lock
INFO 23/11/14 7:20 PM:liquibase: Reading from auv_viewer_track.databasechangelog
INFO 23/11/14 7:20 PM:liquibase: Reading from auv_viewer_track.databasechangelog
INFO 23/11/14 7:20 PM:liquibase: Successfully released change log lock
INFO 23/11/14 9:58 PM:liquibase: Successfully acquired change log lock
INFO 23/11/14 9:58 PM:liquibase: Reading from auv_viewer_track.databasechangelog
INFO 23/11/14 9:58 PM:liquibase: Reading from auv_viewer_track.databasechangelog
INFO 23/11/14 9:58 PM:liquibase: Successfully released change log lock
INFO 23/11/14 10:10 PM:liquibase: Successfully acquired change log lock
INFO 23/11/14 10:10 PM:liquibase: Reading from auv_viewer_track.databasechangelog
INFO 23/11/14 10:10 PM:liquibase: Reading from auv_viewer_track.databasechangelog
INFO 23/11/14 10:10 PM:liquibase: Successfully released change log lock

Some investigation was done with @anguss00 and @julian1 on dbrc, and the process seems to be idle.

It's worth mentioning this takes 30secondes on my local machine.

Here is the content of the liquibase statements.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd">

 <changeSet author="talend" id="auv_tracks">
  <sql>
    CREATE TABLE auv_tracks
    (
    pkid bigserial,
    file_id integer,
    dive_number integer,
    dive_name text,
    dive_metadata_uuid text,
    facility_code text,
    campaign_code text,
    dive_code text,
    distance_covered_in_m real,
    number_of_images integer,
    image_path text,
    abstract text,
    platform_code text,
    pattern text,
    dive_report_path text,
    kml_path  text,
    geospatial_lat_min real,
    geospatial_lon_min real,
    geospatial_lat_max real,
    geospatial_lon_max real,
    geospatial_vertical_min real,
    geospatial_vertical_max real,
    time_coverage_start timestamp with time zone,
    time_coverage_end timestamp with time zone,
    geom geometry,
    CONSTRAINT auv_tracks_pkey PRIMARY KEY (pkid),
    CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
    CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) = 'LINESTRING'::text OR geom IS NULL),
    CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326))

  </sql>

  <sql>
    CREATE INDEX auv_tracks_geom_gist_idx ON auv_tracks USING GIST ( geom );
  </sql>

</changeSet>

<changeSet author="talend" id="auv_tracks_constraint1">
  <sql>
    ALTER TABLE auv_tracks ADD CONSTRAINT  fkey_file_id FOREIGN KEY (file_id) REFERENCES indexed_file(id)  MATCH SIMPLE      ON UPDATE CASCADE ON DELETE CASCADE;
  </sql>
</changeSet>

<changeSet author="talend" id="auv_images">
  <sql>
    CREATE TABLE auv_images
    (
    pkid bigserial,
    file_id integer,
    fk_auv_tracks integer,
    campaign_code text,
    dive_code text,
    image_filename text,
    longitude double precision,
    latitude double precision,
    image_width double precision,
    depth_sensor double precision,
    altitude_sensor double precision,
    depth double precision,
    sea_water_temperature double precision,
    sea_water_salinity double precision,
    chlorophyll_concentration_in_sea_water double precision,
    backscattering_ratio double precision,
    colored_dissolved_organic_matter double precision,
    "time" timestamp with time zone,
    cluster_tag integer,
    up_left_lon double precision,
    up_left_lat double precision,
    up_right_lon double precision,
    up_right_lat double precision,
    low_right_lon double precision,
    low_right_lat double precision,
    low_left_lon double precision,
    low_left_lat double precision,
    geom geometry,
    CONSTRAINT auv_images_pkey PRIMARY KEY (pkid),
    CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
    CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) = 'POLYGON'::text OR geom IS NULL),
    CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326))

  </sql>

  <sql>
    CREATE INDEX auv_images_geom_gist_idx ON auv_images USING GIST ( geom );
  </sql>

</changeSet>

<changeSet author="talend" id="auv_images_constraint1">
  <sql>
    ALTER TABLE auv_images ADD CONSTRAINT 
    auv_image_file_fkey FOREIGN KEY (file_id)
    REFERENCES indexed_file (id) MATCH SIMPLE
    ON UPDATE CASCADE ON DELETE CASCADE
  </sql>
</changeSet>

<changeSet author="talend" id="auv_images_constraint2">
  <sql>
    ALTER TABLE auv_images ADD CONSTRAINT 
    auv_track_dive_fkey FOREIGN KEY (fk_auv_tracks)
    REFERENCES  auv_tracks (pkid) MATCH SIMPLE
    ON UPDATE CASCADE ON DELETE CASCADE
  </sql>
</changeSet>

<changeSet author="talend" id="auv_reporting">
  <sql>
    CREATE TABLE auv_reporting
    (
    campaign_code  text,
    campaign_metadata_uuid  text,
    dive_code  text,
    dive_code_metadata_uuid text,
    data_on_portal  text,
    data_on_auv_viewer  text,
    data_folder  text,
    geotiff  text,
    mesh  text,
    multibeam  text,
    cdom  text,
    cphl  text,
    opbs  text,
    psal  text,
    temp  text,
    csv_track_file text,
    dive_report text
    )
  </sql>
</changeSet>

<changeSet author="talend" id="auv_views" runOnChange = "true">
  <sql>
    DROP VIEW IF EXISTS auv_images_vw;
    CREATE VIEW auv_images_vw AS 
    (
    SELECT  
      auv_images.fk_auv_tracks ,
      auv_tracks.dive_code AS site_code,
      auv_tracks.image_path AS image_folder,
      auv_tracks.dive_name AS dive_code_name,
      auv_tracks.campaign_code,
      auv_images.image_filename,
      auv_images.cluster_tag ,  
      auv_images.longitude  ,
      auv_images.latitude  ,
      auv_images.image_width  ,
      auv_images.depth_sensor  ,
      auv_images.altitude_sensor  ,
      auv_images.depth  ,
      auv_images.sea_water_temperature  ,
      auv_images.sea_water_salinity , 
      auv_images.chlorophyll_concentration_in_sea_water ,
      auv_images.backscattering_ratio  ,
      auv_images.colored_dissolved_organic_matter  ,
      auv_images."time" AT TIME ZONE 'UTC' AS "time",      
      auv_images.geom AS geom
    from  auv_images, auv_tracks   WHERE auv_images.fk_auv_tracks = auv_tracks.pkid
    ORDER BY auv_images."time"  )
  </sql>

  <sql>
    DROP VIEW IF EXISTS auv_tracks_vw ;
    CREATE VIEW auv_tracks_vw AS 
    (
    select pkid ,
    facility_code,
    campaign_code,
    dive_code as site_code,
    image_path as image_folder,
    abstract,
    platform_code,
    dive_number,
    dive_name as dive_code_name,
    pattern,
    number_of_images,
    distance_covered_in_m as distance,
    NULL::text as notes,
    dive_report_path as dive_report,
    kml_path as kml,
    dive_metadata_uuid as metadata_uuid,
    geospatial_lat_min ,
    geospatial_lon_min ,
    geospatial_lat_max ,
    geospatial_lon_max ,
    geospatial_vertical_min ,
    geospatial_vertical_max ,
    time_coverage_start AT TIME ZONE 'UTC' AS time_coverage_start,
    time_coverage_end AT TIME ZONE 'UTC' AS time_coverage_end,
    geom AS geom
    from auv_tracks)
  </sql>
</changeSet>

<changeSet author="talend" id="expand_too_small_polygons" runOnChange="true" >
  <createProcedure>

    CREATE OR REPLACE FUNCTION expand_too_small_polygons()
      RETURNS void AS
    '  declare
        bad_pkid_images RECORD;
      BEGIN

         FOR bad_pkid_images IN select pkid from  auv_viewer_track.auv_images where  (NOT ST_IsValid(geom))
            LOOP
                UPDATE auv_viewer_track.auv_images set geom = st_buffer(geom,0.00000001,''endcap=square'') where pkid = bad_pkid_images.pkid;
         END LOOP; 

      END;'
      LANGUAGE plpgsql VOLATILE
  </createProcedure>
 </changeSet>

<changeSet author="talend" id="clean_dive_campaign" runOnChange="true" >
  <createProcedure>

    CREATE OR REPLACE FUNCTION clean_dive_campaign()
      RETURNS void AS
    '  declare

     file_track_id_to_remove int;
        file_image_id_to_remove int;
      BEGIN

    select distinct (file_id) into file_track_id_to_remove
    from auv_tracks tracks
    WHERE  NOT EXISTS( 
      select distinct(fk_auv_tracks) 
      from auv_images images
      where images.fk_auv_tracks = tracks.pkid
      );

       delete from  indexed_file  where id = file_track_id_to_remove;
       delete from  auv_tracks  where file_id = file_track_id_to_remove;

    select distinct (file_id) into file_image_id_to_remove
    from auv_images images
    WHERE  NOT EXISTS( 
      select distinct(pkid) 
      from auv_tracks tracks
      where tracks.pkid = images.fk_auv_tracks 
      );

    delete from  indexed_file where id = file_image_id_to_remove;
    delete from  auv_images where file_id = file_image_id_to_remove;

      END;'
      LANGUAGE plpgsql VOLATILE
  </createProcedure>
</changeSet>

<changeSet author="talend" id="indexesImages" runOnChange="true" >
  <sql>
    CREATE INDEX auv_images_fk_idx
      ON auv_images
      USING btree
      (fk_auv_tracks, pkid);
  </sql>

  <sql>
    CREATE INDEX auv_images_time_idx
      ON auv_images
      USING btree
      ("time");
  </sql>
</changeSet>

</databaseChangeLog>
lbesnard commented 9 years ago

for reference, @anguss00 dropped the schema on dbprod adn deployed the latest version of the harvester. An hour after starting it from scratch, the liquibase statement is still stuck at the beginning, and the schema is completely empty

jonescc commented 9 years ago

@lbesnard, liquibase is also used by the indexing components to manage their own supporting tables. That appears to be what is creating the logging above. However, the liquibase changes themselves are happening very quickly, there is no time difference between liquibase grabbing a lock to do its updates and releasing the lock. Its more likely that the time is being spent iterating through all the files to determine what has changed, and from the above it looks like the first indexing component is the one that is taking most of the time.

lbesnard commented 9 years ago

@jonescc Thanks for looking at this. There is a problem in the indexing then.

All the files under /mnt/imos-t4/IMOS/public/AUV/auv_viewer_data/thumbnails are in context.exclude , which is set up in the 3 iIndexFiles components. This folder contains more than 6millions of files.

I suspected at the beginning that the indexing was the problem, and tried then to change the 3 different context.include to force which files to harvest. Either there is a problem with my regexp, either the component doesn't behave as it should.

any ideas ?