glamod / glamod-ingest

Database preparation and ingestion for GLAMOD
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Records that don't match one source_id (cdmlite / land) #46

Closed agstephens closed 3 years ago

agstephens commented 3 years ago

I'm processing the land cdmlite data now. It is generally going well but there are some observations that don't find a valid match to a "source_id" in either the sub-daily or the daily+monthly Source Configuration files you provided.

Here is an example:

primary_id = CA001091174 record_number = 2 frequency = daily

How do you think we should handle these?

agstephens commented 3 years ago

The ideal solution to this would be:

  1. If source_id not matched: set source_id to: -9999
  2. But do not FAIL
  3. After source_ids have been added, filter out the -9999 records.
  4. Write those -9999 records to a log file somewhere: if filtered_records: filtered_records.to_csv(...)
  5. Then process the rest of the records in the normal way.

NEED TO TEST THIS AGAINST A REAL EXAMPLE TO PROVE RESULT IS THE SAME AFTER PUTTING IN THIS FILTER

This way, we only end up with a few bad records being excluded, rather than lots.

agstephens commented 3 years ago

Simon fixed this problem. Just need to rerun over failures. No action required in our ingestion code.