google-code-export / ccc-gistemp

Automatically exported from code.google.com/p/ccc-gistemp
Other
0 stars 0 forks source link

Hohenpeissenberg handling is complicated. #60

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
[I wrote this because I have an idea for how to do it much more simply, not 
particular because I 
believe the current way is wrong]

Step 0 has code to handle an e-mailed record for Hohenpeissenberg.  The upshot 
of which is to 
throw away all records except one, replacing them with a record spliced from 
e-mail, up to end 
of 2002, and the long GHCN record, from 2003 onward.

(the e-mailed record contains data for an obvious gap in the GHCN records)

It struck me that it is much simpler to add the e-mailed Hohenpeissenberg 
record as a new 
duplicate GHCN record and let Step 1 combine all the duplicate station records 
into 1.

I have done this and the changes are minimal.  Attached is a file showing the 
difference (in 
0.01C) between the new and old Hohenpeissenberg record after Step 1.

Original issue reported on code.google.com by d...@ravenbrook.com on 11 Mar 2010 at 4:22

Attachments:

GoogleCodeExporter commented 9 years ago
clarify: when I say "I have done this" I mean I have done this in a temporary 
workspace, I have not changed the 
checked-in code.

Original comment by d...@ravenbrook.com on 11 Mar 2010 at 4:23

GoogleCodeExporter commented 9 years ago
Here's a unfinished patch:

Index: code/step0.py
===================================================================
--- code/step0.py       (revision 383)
+++ code/step0.py       (working copy)
@@ -117,6 +117,9 @@
     dataset with the priv. comm. data."""
     print "Correct the GHCN Hohenpeissenberg record."

+    ghcn_records[hohenpeissenberg.uid] = hohenpeissenberg
+    return
+
     for record in ghcn_records.itervalues():
         if record.station_uid == hohenpeissenberg.station_uid:
             # Extract the data for the years 2003 to present.
Index: tool/giss_io.py
===================================================================
--- tool/giss_io.py     (revision 383)
+++ tool/giss_io.py     (working copy)
@@ -628,7 +695,7 @@
     We only want data from 1880 to 2002.
     """ 

-    record = code.giss_data.StationRecord('617109620002')
+    record = code.giss_data.StationRecord('617109620009')
     for line in open(path):
         if line[0] in '12':
             year = int(line[:4])

Original comment by d...@ravenbrook.com on 12 Mar 2010 at 8:22

GoogleCodeExporter commented 9 years ago
One of the problems with this change is that it does not correct or eliminate 
the somewhat dubious data in GHCN.  If we do genuinely prefer the priv. comm. 
data then we ought to remove the GHCN data (this is what the current code does 
and it is slightly complicated).  It occurs to me that there is an existing 
mechanism for eliminating dubious data: the drop_strange() function of 
step1.py.  Unfortunately that function runs at the end of step1, after records 
have been combined.  It seems dubious that bogus values are dropped after 
records have been combined: bogus values may contribute to the offset applied 
when combining records.

So: consider using drop_strange() to remove hohenpeissenberg GHCN data;
consider if drop_strange() should be moved to the beginning of step1 (this may 
break GISTEMP compatibility) (and if so, create an issue!).

Original comment by d...@ravenbrook.com on 1 Feb 2011 at 11:25