PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 234 forks source link

Create a Data Base clean up script/ set of funcitons #1630

Open mccabete opened 7 years ago

mccabete commented 7 years ago

It would be good to have a script or set of functions or both that round up records that need to be fixed. This could apply to:

files without input records, inputs with no file records, file-formats without files, dbfiles records that point to files that don’t exist etc.

This could just be some queries that puts records into a generic readable file that could be hand edited, and then re-fed into a function that would cull records that remain.

mccabete commented 7 years ago

@tonygardella What DB errors do we tend to get?

ashiklom commented 7 years ago

Let's make an editable list! I'll start a checklist here -- feel free to edit this comment (pencil button in the top right of the issue) to add stuff, or check stuff off that has been implemented.

dlebauer commented 7 years ago
ashiklom commented 7 years ago

@ankurdesai suggestion -- Function for a given site and met product that would purge everything except original data download.

mccabete commented 7 years ago

Mike- A function that takes away old runs, and especially met made by old runs

tonygardella commented 7 years ago

It use to be that failed downloads of raw met data created a db record even though actual files are not there. For example if a site only had 2000- 2010 and you ran 1998-2002 it would update the input record to say 1998-2002 was there and only download 2000-2002. No sure how to fix this mismatch other than to delete old runs and their associated file as much as possible.

ashiklom commented 7 years ago

failed downloads of raw met data created a db record even though actual files are not there

Input records, especially met records, should have a start date, end date, and format specification, right? Perhaps we can use the format record and load_data functionality to get the actual bounds of the data, compare them to the start and end date, and if there is a mismatch, then flag the file for deletion? Similarly, would it be too aggressive to delete any input file that is missing any of these specifications? I.e. Are there circumstances where it's OK for an input to be missing a start date, end date, or format?

dlebauer commented 7 years ago

@ashiklom soil datasets often do not have a start and end date

tonygardella commented 7 years ago

@ashiklom I think that would work. We could update the records to reflect the data that is actually there or delete them. I think for this it would be best to just use the met format records.

mccabete commented 7 years ago

A checklist of functions to create. Also available here in a google doc.

A quick and dirty synthesis of github comments/issues turned into a list of functions:

Mass Execution functions

Checks

Find DB Entry Functions

Find Run Functions

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.