Closed GoogleCodeExporter closed 9 years ago
It's possible that "ver 5.1 step1.out has 67 more lines than ver 4.1" is
another manifestation of partial years not being logged, see Issue 70.
Having done some brief investigation, this is still mysterious.
In the r481 archive (see the downloads) station 40678369000 is present and
14063210000 is not (so in that regard, r481 is like release 5.1).
Original comment by d...@ravenbrook.com
on 31 Aug 2010 at 2:14
Concerning 40678369000. This stations has 2 duplicates, 0 and 1.
As far as I can tell, this plot:
http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=406783690000&dat
a_set=1&num_neighbors=1
indicates that after Step 1 GISTEMP the station still has 2 duplicates: They
have NOT been combined.
However in ccc-gistemp, they have been combined.
With the records not combined, that is sufficient to explain why they do not
appear in Step 2 output: neither record is long enough to satisfy the "at least
20 years requirement"; the combined record does meet the 20 year requirement.
The combined record looks well dodgy, so I suspect the 2 records are being
incorrectly combined. Just possibly it's a marginal case where ccc-gistemp
rules one way and GISTEMP the other.
Original comment by d...@ravenbrook.com
on 31 Aug 2010 at 2:44
Re 40678369000 (see comment 2).
This station, and 30684043000, had their "duplicate" records combined
incorrectly. Due to a bug in step1.find_quintuples that I introduced when I
moved everything to have all records starting in 1880.
That particular bug is fixed by r565, but that still leaves the stations that
are missing in release 5.1 (the list starting 14063210000, above).
Original comment by d...@ravenbrook.com
on 17 Sep 2010 at 10:44
Bob Koss says "a few are questionable". I would say that the majority are
characterised by being a series of broken duplicates often with what look like
undocumented station moves that barely stitch together properly.
Some of them stitch together nicely (or ought to):
63608567000
63608530000
60815505000
51891648000
50491629000
40678310000
30580370002
30580219002
this is a single duplicate:
40371066000 (broken, but i don't see why it is dropped)
more suspect:
50491487000
40778485001 (station move)
40778473001 (station move)
40778466001
22220353000
15567477000 (large gap, but apart from that, looks continuous)
14063230000
14063210000
includes offset overlap:
43378925000
Original comment by d...@ravenbrook.com
on 17 Sep 2010 at 12:21
r566 fixes the entire "Bob Koss" list, but there are still differences. A
comparison of 12-digit IDs reveals:
> 155676630002
< 219417560003
> 219417560000
< 406783250003
> 406783250002
> 425724040020
> 425724060060
> 425725230040
> 425725230060
> 425725970090
< 509982230002
> 509982230000
< 643081600003
< 643083730010
('>' means present in r566, '<' means present in release 4.1)
The 425* stations are a previously noted and fixed issue regarding US GHCN
stations: issue 76.
The 643* stations are suspect spanish stations with bad metadata. They have
been dropped by GISTEMP's config/Ts.strange.RSU.list.IN file.
That leaves 3 stations that gave a different 12-digit ID, and one station that
has 1 more duplicate than previously.
Original comment by d...@pobox.com
on 20 Sep 2010 at 3:41
This issue was closed by revision r567.
Original comment by d...@pobox.com
on 21 Sep 2010 at 9:41
Original issue reported on code.google.com by
d...@ravenbrook.com
on 31 Aug 2010 at 8:21