google-code-export / ccc-gistemp

Automatically exported from code.google.com/p/ccc-gistemp
Other
0 stars 0 forks source link

List of retained stations differs between releases. #84

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
At http://clearclimatecode.org/a-real-land-mask/comment-page-1/#comment-3175 
Bob Koss originally wrote:

In case you are short on work.  ;-)
Just kidding.

A bug seems to have crept in since ver 4.1 and prior to release of ver 5.1.

I ran my original copies of ver 4.1 and 5.1 using the 5.1 input data files for 
both. This code would be prior to the GHCN/USHCN bug recently corrected in the 
comments in the "Just 440 stations" thread.

The stations below are present in ver 4.1 step2.out and Giss online adjusted 
files, but don't appear in ver 5.1 step2.out. A few are questionable and have 
broken records, but others appear very good.
14063210000 1934    1970
14063230000 1936    1970
15567477000 1933    1970
22220353000 1951    1981
30580219002 1952    1980
30580370002 1953    1980
40371066000 1970    2010
40678310000 1952    1980
40778466001 1951    1980
40778473001 1951    1980
40778485001 1952    1980
43378925000 1961    2010
50491487000 1922    1980
50491629000 1956    1981
51891648000 1957    1981
60815505000 1951    1980
63608530000 1961    1981
63608567000 1961    1981

The two stations below are in ver 5.1 step2.out, but not in the ver 4.1 nor in 
the Giss online adjusted.
40678369000 1952    1981
30684043000 1961    1982

Here are the differences in lines per file after running steps 0-2 on both 
versions with identical data.
ver 5.1 mean_comb has 3 fewer lines than ver 4.1
ver 5.1 step1.out has 67 more lines than ver 4.1
ver 5.1 step2.out has 1052 fewer lines than ver 4.1

Original issue reported on code.google.com by d...@ravenbrook.com on 31 Aug 2010 at 8:21

GoogleCodeExporter commented 9 years ago
It's possible that "ver 5.1 step1.out has 67 more lines than ver 4.1" is 
another manifestation of partial years not being logged, see Issue 70.

Having done some brief investigation, this is still mysterious.

In the r481 archive (see the downloads) station 40678369000 is present and 
14063210000 is not (so in that regard, r481 is like release 5.1).

Original comment by d...@ravenbrook.com on 31 Aug 2010 at 2:14

GoogleCodeExporter commented 9 years ago
Concerning 40678369000.  This stations has 2 duplicates, 0 and 1.

As far as I can tell, this plot:

http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=406783690000&dat
a_set=1&num_neighbors=1

indicates that after Step 1 GISTEMP the station still has 2 duplicates: They 
have NOT been combined.

However in ccc-gistemp, they have been combined.

With the records not combined, that is sufficient to explain why they do not 
appear in Step 2 output: neither record is long enough to satisfy the "at least 
20 years requirement"; the combined record does meet the 20 year requirement.

The combined record looks well dodgy, so I suspect the 2 records are being 
incorrectly combined.  Just possibly it's a marginal case where ccc-gistemp 
rules one way and GISTEMP the other.

Original comment by d...@ravenbrook.com on 31 Aug 2010 at 2:44

GoogleCodeExporter commented 9 years ago
Re 40678369000 (see comment 2).

This station, and 30684043000, had their "duplicate" records combined 
incorrectly.  Due to a bug in step1.find_quintuples that I introduced when I 
moved everything to have all records starting in 1880.

That particular bug is fixed by r565, but that still leaves the stations that 
are missing in release 5.1 (the list starting 14063210000, above).

Original comment by d...@ravenbrook.com on 17 Sep 2010 at 10:44

GoogleCodeExporter commented 9 years ago
Bob Koss says "a few are questionable".  I would say that the majority are 
characterised by being a series of broken duplicates often with what look like 
undocumented station moves that barely stitch together properly.

Some of them stitch together nicely (or ought to):

63608567000
63608530000
60815505000
51891648000
50491629000
40678310000
30580370002
30580219002

this is a single duplicate:

40371066000 (broken, but i don't see why it is dropped)

more suspect:

50491487000
40778485001 (station move)
40778473001 (station move)
40778466001
22220353000
15567477000 (large gap, but apart from that, looks continuous)
14063230000
14063210000

includes offset overlap:

43378925000

Original comment by d...@ravenbrook.com on 17 Sep 2010 at 12:21

GoogleCodeExporter commented 9 years ago
r566 fixes the entire "Bob Koss" list, but there are still differences.  A 
comparison of 12-digit IDs reveals:

> 155676630002
< 219417560003
> 219417560000
< 406783250003
> 406783250002
> 425724040020
> 425724060060
> 425725230040
> 425725230060
> 425725970090
< 509982230002
> 509982230000
< 643081600003
< 643083730010

('>' means present in r566, '<' means present in release 4.1)
The 425* stations are a previously noted and fixed issue regarding US GHCN 
stations: issue 76.
The 643* stations are suspect spanish stations with bad metadata.  They have 
been dropped by GISTEMP's config/Ts.strange.RSU.list.IN file.

That leaves 3 stations that gave a different 12-digit ID, and one station that 
has 1 more duplicate than previously.

Original comment by d...@pobox.com on 20 Sep 2010 at 3:41

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r567.

Original comment by d...@pobox.com on 21 Sep 2010 at 9:41