firelab / viirs_ba

Python code for burned area estimation using VIIRS scenes.
Creative Commons Zero v1.0 Universal
7 stars 4 forks source link

Optimize PostGIS deployment #10

Closed bnordgren closed 8 years ago

bnordgren commented 8 years ago

Take a few steps to optimize the PostGIS deployment here at firelab:

bnordgren commented 8 years ago

The Postgis server has 32Gb. I left 4Gb for the OS, just because it's windows and set

bnordgren commented 8 years ago

Looks good, between this and implementing the geographic window, the brute force processing time for July 2014 (night+day+375m) was reduced from 27hours to 18hours. That's almost exactly 2/3 the previous time.

calbury commented 8 years ago

Bryce, The Simple confirm burn seems to be a bottle neck. This speed things up when the database was smaller, but now seems to be slowing things down. Earlier I had mentioned taking this out, but that that would result in growth off thresholded cells. I think we can get around that by adding a requirement that to the threshold_2_fire_events function that the source must be ActiveFire (and removing the confirmed = true requirement). An updated version of the function is below. I did some quick testing and this speed things up significantly. I’m still verifying that it works the same, but it seems to.

You will need to comment out the bit that runs simple confirm burn (~539-543):

simple confirm threshold burns

print "\nPerform simple confirm burned area"

date_4db = datetime.datetime.strftime(H5Date, "%Y-%m-%d %H:%M:%S")

execute_simple_confirm_burns(date_4db, TemporalProximity, SpatialProximity)

vacuum_analyze("threhold_burned")

Also note that there is a typo in the following function, see highlighted bit should be “Start threshold_2_events”: def execute_threshold_2_events(collectionDate, interval, distance): print "Start simple confirm burn", get_time() query_text = "SELECT VIIRS_threshold_2_fireevents(\'{0}\', \'{1}\', {2});".format(collectionDate, interval, distance) execute_query(query_text)

-- Function: viirs_threshold_2_fireevents(timestamp without time zone, interval, integer)

-- DROP FUNCTION viirs_threshold_2_fireevents(timestamp without time zone, interval, integer);

CREATE OR REPLACE FUNCTION viirs_threshold_2_fireevents(timestamp without time zone, interval, integer) RETURNS void AS $BODY$ DECLARE collection timestamp without time zone := $1; recent interval := $2; distance integer := $3; -- a_row active_fire_alb%rowtype; a_row RECORD; dumrec RECORD; BEGIN FOR a_row IN SELECT a.* FROM threshold_burned a WHERE collection_date = collection LOOP SELECT * from (SELECT fe.fid as fe_fid, fe.geom, fc.fid as fc_fid FROM fire_events fe, fire_collections fc WHERE fe.collection_id = fc.fid AND fc.last_update >= collection - recent AND fc.last_update <= collection AND fe.source = 'ActiveFire') as tf WHERE ST_DWithin(ST_Transform(a_row.geom, 102008), tf.geom, distance) LIMIT 1 INTO dumrec; IF EXISTS (SELECT * from (SELECT fe.fid as fe_fid, fe.geom, fc.fid as fc_fid FROM fire_events fe, fire_collections fc WHERE fe.collection_id = fc.fid AND fc.last_update >= collection - recent AND fc.last_update <= collection AND fe.source = 'ActiveFire') as tf WHERE ST_DWithin(ST_Transform(a_row.geom, 102008), tf.geom, distance) LIMIT 1) THEN RAISE NOTICE 'found a match' ; INSERT INTO fire_events(latitude, longitude, geom, source, collection_id, collection_date, pixel_size, band_i_m) VALUES(a_row.latitude, a_row.longitude, ST_Multi(ST_Transform(a_row.geom, 102008)), 'Threshold', dumrec.fc_fid, a_row.collection_date, a_row.pixel_size, a_row.band_i_m); UPDATE fire_collections SET last_update = a_row.collection_date WHERE dumrec.fc_fid = fire_collections.fid; END IF; END LOOP; return; END $BODY$ LANGUAGE plpgsql VOLATILE COST 100; ALTER FUNCTION viirs_threshold_2_fireevents(timestamp without time zone, interval, integer) OWNER TO postgres;

[Forest Service Shield]

Carl Albury Remote Sensing Specialist

Forest Service Contractor Remote Sensing Applications Center

p: 801-975-3351 calbury@fs.fed.usmailto:calbury@fs.fed.us

2222 West 2300 South Salt Lake City, UT 84119 www.fs.fed.ushttp://www.fs.fed.us/ [USDA Logo]http://usda.gov/[Forest Service Twitter]https://twitter.com/forestservice[USDA Facebook]https://www.facebook.com/pages/US-Forest-Service/1431984283714112

Caring for the land and serving people

From: bnordgren [mailto:notifications@github.com] Sent: Saturday, March 05, 2016 1:26 PM To: firelab/viirs_ba viirs_ba@noreply.github.com Subject: Re: [viirs_ba] Optimize PostGIS deployment (#10)

Looks good, between this and implementing the geographic window, the brute force processing time for July 2014 (night+day+375m) was reduced from 27hours to 18hours. That's almost exactly 2/3 the previous time.

— Reply to this email directly or view it on GitHubhttps://github.com/firelab/viirs_ba/issues/10#issuecomment-192729043.

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

calbury commented 8 years ago

I tested removing simple confirm burn and modifying the threshold_2_fire_events on 15 July_2014 scenes and the output was identical, and it was significantly faster. I would recommend this change. I was a little vague below, so if that doesn’t make sense let me know. Thanks,

[Forest Service Shield]

Carl Albury Remote Sensing Specialist

Forest Service Contractor Remote Sensing Applications Center

p: 801-975-3351 calbury@fs.fed.usmailto:calbury@fs.fed.us

2222 West 2300 South Salt Lake City, UT 84119 www.fs.fed.ushttp://www.fs.fed.us/ [USDA Logo]http://usda.gov/[Forest Service Twitter]https://twitter.com/forestservice[USDA Facebook]https://www.facebook.com/pages/US-Forest-Service/1431984283714112

Caring for the land and serving people

From: Albury, Carl -FS Sent: Saturday, March 05, 2016 4:38 PM To: 'firelab/viirs_ba' reply@reply.github.com; firelab/viirs_ba viirs_ba@noreply.github.com Cc: Nordgren, Bryce L -FS bnordgren@fs.fed.us; Bryce L Nordgren (bnordgren@gmail.com) (bnordgren@gmail.com) bnordgren@gmail.com Subject: RE: [viirs_ba] Optimize PostGIS deployment (#10)

Bryce, The Simple confirm burn seems to be a bottle neck. This speed things up when the database was smaller, but now seems to be slowing things down. Earlier I had mentioned taking this out, but that that would result in growth off thresholded cells. I think we can get around that by adding a requirement that to the threshold_2_fire_events function that the source must be ActiveFire (and removing the confirmed = true requirement). An updated version of the function is below. I did some quick testing and this speed things up significantly. I’m still verifying that it works the same, but it seems to.

You will need to comment out the bit that runs simple confirm burn (~539-543):

simple confirm threshold burns

print "\nPerform simple confirm burned area"

date_4db = datetime.datetime.strftime(H5Date, "%Y-%m-%d %H:%M:%S")

execute_simple_confirm_burns(date_4db, TemporalProximity, SpatialProximity)

vacuum_analyze("threhold_burned")

Also note that there is a typo in the following function, see highlighted bit should be “Start threshold_2_events”: def execute_threshold_2_events(collectionDate, interval, distance): print "Start simple confirm burn", get_time() query_text = "SELECT VIIRS_threshold_2_fireevents(\'{0}\', \'{1}\', {2});".format(collectionDate, interval, distance) execute_query(query_text)

-- Function: viirs_threshold_2_fireevents(timestamp without time zone, interval, integer)

-- DROP FUNCTION viirs_threshold_2_fireevents(timestamp without time zone, interval, integer);

CREATE OR REPLACE FUNCTION viirs_threshold_2_fireevents(timestamp without time zone, interval, integer) RETURNS void AS $BODY$ DECLARE collection timestamp without time zone := $1; recent interval := $2; distance integer := $3; -- a_row active_fire_alb%rowtype; a_row RECORD; dumrec RECORD; BEGIN FOR a_row IN SELECT a.* FROM threshold_burned a WHERE collection_date = collection LOOP SELECT * from (SELECT fe.fid as fe_fid, fe.geom, fc.fid as fc_fid FROM fire_events fe, fire_collections fc WHERE fe.collection_id = fc.fid AND fc.last_update >= collection - recent AND fc.last_update <= collection AND fe.source = 'ActiveFire') as tf WHERE ST_DWithin(ST_Transform(a_row.geom, 102008), tf.geom, distance) LIMIT 1 INTO dumrec; IF EXISTS (SELECT * from (SELECT fe.fid as fe_fid, fe.geom, fc.fid as fc_fid FROM fire_events fe, fire_collections fc WHERE fe.collection_id = fc.fid AND fc.last_update >= collection - recent AND fc.last_update <= collection AND fe.source = 'ActiveFire') as tf WHERE ST_DWithin(ST_Transform(a_row.geom, 102008), tf.geom, distance) LIMIT 1) THEN RAISE NOTICE 'found a match' ; INSERT INTO fire_events(latitude, longitude, geom, source, collection_id, collection_date, pixel_size, band_i_m) VALUES(a_row.latitude, a_row.longitude, ST_Multi(ST_Transform(a_row.geom, 102008)), 'Threshold', dumrec.fc_fid, a_row.collection_date, a_row.pixel_size, a_row.band_i_m); UPDATE fire_collections SET last_update = a_row.collection_date WHERE dumrec.fc_fid = fire_collections.fid; END IF; END LOOP; return; END $BODY$ LANGUAGE plpgsql VOLATILE COST 100; ALTER FUNCTION viirs_threshold_2_fireevents(timestamp without time zone, interval, integer) OWNER TO postgres;

[Forest Service Shield]

Carl Albury Remote Sensing Specialist

Forest Service Contractor Remote Sensing Applications Center

p: 801-975-3351 calbury@fs.fed.usmailto:calbury@fs.fed.us

2222 West 2300 South Salt Lake City, UT 84119 www.fs.fed.ushttp://www.fs.fed.us/ [USDA Logo]http://usda.gov/[Forest Service Twitter]https://twitter.com/forestservice[USDA Facebook]https://www.facebook.com/pages/US-Forest-Service/1431984283714112

Caring for the land and serving people

From: bnordgren [mailto:notifications@github.com] Sent: Saturday, March 05, 2016 1:26 PM To: firelab/viirs_ba viirs_ba@noreply.github.com<mailto:viirs_ba@noreply.github.com> Subject: Re: [viirs_ba] Optimize PostGIS deployment (#10)

Looks good, between this and implementing the geographic window, the brute force processing time for July 2014 (night+day+375m) was reduced from 27hours to 18hours. That's almost exactly 2/3 the previous time.

— Reply to this email directly or view it on GitHubhttps://github.com/firelab/viirs_ba/issues/10#issuecomment-192729043.

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.