aodn / content

Tracks AODN Portal content and configuration issues
0 stars 0 forks source link

soop_trv harvester filling up storage with temporary tables #455

Closed jonescc closed 4 years ago

jonescc commented 4 years ago

Refer https://github.com/aodn/issues/issues/648

soop_trv uses a join with a group by condition to create merged measurements which requires the creation of billions of records on temproary storage. Problem has been tracked down to using measurements from a file that has been deleted and from multiple files with the same measurement times pushed through the pipeline and old ones not deleted. Investigation focussed on trip_id 7303, but all other trips should also be checked for these issues.

lbesnard commented 4 years ago

@jonescc I believe this is now fixed. Leigh is not seeing those big dump files anymore. I'm currently deleting a lot of duplicate data (800 files one by one with po_s3_del), a mistake from the facility. so we'll see how this behaves

@lwgordonimos FYI

jonescc commented 4 years ago

Changes look good to me.