Open GoogleCodeExporter opened 9 years ago
I guess we need to fix time-setting bug #67 first...
Original comment by skliarie@gmail.com
on 24 Apr 2010 at 10:04
well... GPS applications are supposed to take the time reported by the GPS
receiver,
not system time (because there is no guarantee that the system time is anywhere
near
accurate). That should be fairly simple but even here something seems to be
going
wrong. Unless the apps are introducing some error (which I consider unlikely
since
they're both experiencing the same problems), I would say this is a fault in
the GPS
components in Android - which should not do any filtering, conversions or
calculation
but just report what it gets from the GPS receiver, and already here something
seems
to be going wrong. I would say we can't fix bug #67 (at least not by fetching
time
from GPS) without fixing the GPS timestamps.
And by the way: the local time on my Freerunner display has no problems at all,
other
than that I have to set it manually and it is somewhat off because of that.
Original comment by mich...@vonglasow.com
on 25 Apr 2010 at 3:27
another thing I have noticed is that the time "skips" seem to occur mainly in
areas
with poor GPS reception - the other day I was tracking in inner city streets for
about two hours and, when opening the GPX file in JOSM, it looked like a spider
web
until I limited the maximum length for connection lines between GPX points.
Original comment by mich...@vonglasow.com
on 3 May 2010 at 9:34
looking at it again after some time... afaik the OS talks to the GPS receiver
using NMEA. NMEA is based on messages which are repeated and each of which
supplies a different kind of information (the position itself, satellites in
view, signal and fix quality etc.).
Timestamps are included with more than one message, but unfortunately the
format differs - some messages include milliseconds while others don't. (The
gpsd website contains a huge rant on that.)
Given that areas with bad reception are particularly prone to incorrect GPS
time I would conclude that Android uses multiple NMEA messages to get the time
but with a "preference" for certain messages. In areas with bad reception, some
messages get omitted (or supply no timestamp), hence other messages lower in
the preference list are used. Apparently the code that processes the messages
returns corrupted timestamps for at least one of the messages lower down in the
list - which would explain the "skips" and their frequency in poor reception
areas.
Original comment by mich...@vonglasow.com
on 24 Jul 2010 at 2:19
I did some tests last night and also had a look at the code (gps_freerunner.c
in freerunner_platform_hardware_hw is where it's at).
Local time was around 1:15 AM (CEST, i.e. UTC +02:00); the GPS status clock
flipped between 21:15 and 22:15. That is, it was 3 to 4 hours behind local
time; 2 hours behind local time would have been correct.
The code contains yet another example of a workaround for the deadly sin the
designers of the C library made: there is a gmtime function which converts
epoch times to a structure, the latter expressed in UTC, and there is
localtime, which does the same but additionally converts to local time. For the
opposite way there is mktime, which takes a structure, interpreting its members
as LOCAL time, and converts it back into epoch time - thus it's the inverse of
localtime. But there is no such counterpart for gmtime, which would allow
converting a structure containing UTC to epoch time. Just what were they
thinking...
The code compensates for this by using mktime for backward conversion,
calculating the UTC offset of the local time zone and adding that to the result.
However... line 279 calculates the time zone offset as:
r->utc_diff = time_utc - time_local;
Hence in my case this would have been -02:00 (which is the opposite of how time
zones are usually expressed).
When processing a timestamp, the members of a tm structure are filled and the
time is calculated in line 328 as
fix_time = mktime( &tm ) + r->utc_diff;
Let's break this down with my example: At 01:15 local time, the tm structure
gets 23:15. Since it interprets that as local time, mktime returns the epoch
time for 21:15 UTC. Then the previously calculated offset (-02:00) is added and
we get 19:15 UTC.
If we assume that GPS Status now uses localtime on that timestamp, it would
display 21:15, just what we observed before.
That still doesn't explain why the timestamp occasionally flips 1 hour forward
and then back again, I suppose we have both lice and fleas and there's another
error somewhere.
I do see that the time zone offset is calculated only once when the receiver is
initialized - this may become a problem if the time zone changes, e.g. when
using the GPS receiver during DST switchover. It would be better to recalculate
the offset every time we get a timestamp. Still not the nicest way as race
conditions are possible, but at the very worst we'd get one botched timestamp
and the next second we'd be OK again.
I'll work out a fix for what I've found so far...
Original comment by mich...@vonglasow.com
on 25 Jul 2010 at 2:03
as for the flipping forward and backward and my previous suspicion about
different NMEA messages: the code in question extracts the timestamps from GGA,
GLL, RMC and ZDA messages. For the first three it considers the timestamp only
if a valid fix is reported; ZDA is a specific message for supplying time and
receiving a ZDA with valid data means the receiver has the accurate time.
The GGA and GLL messages contain timestamps with second precision, i.e. 123519
for 12:35:19 UTC. RMC and ZDA have millisecond precision, e.g. 225446.33 for
22:54:46 and 330 milliseconds.
That means that we may encounter two different timestamp formats, depending on
the messages from which they were extracted, though both are largely similar.
The code processes them from left to right using fixed widths, so any
differences would affect only the seconds.
But... nmea_reader_update_time handles a special case in case we don't have a
date from GPS yet: in that case it calls now() and converts the result into a
tm structure, from which it then obtains the year/month/day values. The very
same tm structure is then used to calculate the timestamp.
The year/month/day/hour/minute/second members are set correctly but the
tm_isdst member never gets set explicitly. That member specifies if the tm
structure is to be interpreted as standard or daylight savings time. In time
zones that have DST, the difference is one hour - which would explain the extra
hour of leap. It's only a guess, but using an uninitialized (and thus random)
value for this is not very nice...
Original comment by mich...@vonglasow.com
on 25 Jul 2010 at 5:00
Here is the patch for what I found so far. I also cleaned up some stuff in
nmea_reader_update_utc_diff: rather than calculating epoch time manually from
the tm structure, I feed it through mktime again. Reason 1: the old code had at
least one bug in that it did not consider leap years, which may cause problems
on certain days. Reason 2: the result calculated here is for correcting values
obtained through mktime, hence consistently using mktime gives us a greater
chance of any bugs or misunderstandings canceling each other out.
Unfortunately I don't have a build environment, so I need to rely on someone to
merge and compile the whole thing for me. I will be more than happy to test it
then and share my results.
Original comment by mich...@vonglasow.com
on 25 Jul 2010 at 5:30
Attachments:
Michael,
Thanks for doing the digging and elaborating on the bug. I can try to build
your stuff. I understood you're patch is against AoF 0.2.0 RC1. Any additional
instructions I should take into account?
I suggest we continue this on the mailinglist.
Niels.
Original comment by niels.he...@gmail.com
on 26 Jul 2010 at 9:17
I created the patch against a git clone of the cupcake branch created this
weekend, but I see the code has not been touched in 7 months, hence 0.2.0 RC1
should have the same code inside. Altogether it should be quite
straightforward, apply the patch and build in the same way RC1 was built.
As for the mailing list, I'm not signed up (I try to keep mailing list
memberships low to avoid bursting my inbox), so I suggest keep information in
this ticket. I suppose it's not much more anyway, and it helps us keep all
relevant information in one place.
Original comment by mich...@vonglasow.com
on 26 Jul 2010 at 4:55
Michael,
I've built your code and created a tar archive containing the files you can use
to deploy to your AoF (files attached).
Please check if this resolves the issue you've found.
During the git apply procedure, I got an whitespace error warning back. Maybe
you can beautify your code to get rid of this?
If we can get things to work, we can provide the patch to Jim Ancona (in cc)
for code review and approval.
Keep us posted how it turns out.
Niels.
Original comment by niels.he...@gmail.com
on 28 Jul 2010 at 7:34
Attachments:
Just dropped in the new library and took a quick glance at the results - they
look promising. GPS Status now reports current local time and I haven't
experienced any flipping of timestamps.
However, since the error was a sporadic one, I would like to rule out the
possibility that I was just lucky during the quick tests I made and try it over
a longer period of time. As I'm going on a weekend trip, I'll just track the
trip and analyze the timestamps I get from the trace; if they are OK, we can
consider the issue fixed.
As for the beautification, I have already done my best to keep the code as
beautiful as possible - what kind of whitespace error is it? My first guess
would be inconsistent use of tabs and the corresponding number of
whitespaces... unfortunately, many editors don't care about that. I'll take a
closer look at it.
Original comment by mich...@vonglasow.com
on 28 Jul 2010 at 5:44
OK, I'm back from my trip and it looks like the patch fixes both bugs - the
offset and the occasional leaps. I also had a look at the whitespace - indeed
the auto-indent feature in Notepad++ replaces whitespaces with tabs. Here's a
beautified (but otherwise unchanged) patch. Enjoy!
Original comment by mich...@vonglasow.com
on 1 Aug 2010 at 7:15
Attachments:
Michael
Thanks for the fixing and testing.
Jim,
Could you review the patch and (if OK) submit the patch to the Freerunner GPS
hardware code? The same patch should be applicable to cupcake and master.
Thanks,
Niels.
Original comment by niels.he...@gmail.com
on 2 Aug 2010 at 8:19
Patch applied to cupcake and master. Thanks for your contribution!
Once the fix is verified in a build, we can close this issue.
Original comment by scarhill
on 2 Aug 2010 at 4:46
[deleted comment]
I do not think this works in its entirety.
With a bit of time on my hand I have written a new version of the once popular
YGPS app, which you can download at
http://www.yunnanexplorer.com/download/androidapps/. This now works on cupcake.
What I see is a frequent toggle of the date in the GPS reported time. I am
using a freerunner where I have not set the date after reflashing, so the
device date is 1 Jan 2000. I have not investigated this in detail, but I
believe the problem is that the date is taken from the time function (which
will give device time) in certain instances and no adjusted to GPS date. This
will of course only show up when the device date is no set to the correct date.
The app mentioned above (you can also find the GPL licensed source code there)
will show the problem.
Original comment by LudwigBr...@gmail.com
on 17 Aug 2010 at 8:45
@Ludwig: can you provide some sample timestamps? (Maybe a short GPX trace...)
What exactly changes? Just the date part or the entire timestamp? Always by the
same amount, back and forth?
As far as I remember, in fact the first fixes might not have a date yet (I
think some NMEA sentences provide just the time but not the date). I've seen
similar behavior in other GPS devices, using local time until a full GPS
timestamp is available.
I will check to rule out any possibilities of local date being used after it
has been obtained from GPS; we also might want to make sure we don't send a
valid timestamp until we've received also the date from GPS. This may increase
the time until we get a valid timestamp but would ensure we get correct
information in any case.
Original comment by mich...@vonglasow.com
on 17 Aug 2010 at 6:31
I think I found the problem, a patch is attached.
The issue was that the code reads beyond the end of the year token in the
following code:
- r->utc_year = str2int(tok_y.p, tok_y.end+4);
(.end is the end of the token, so .end + 4 points four chars beyond the end.)
This should be:
+ r->utc_year = str2int(tok_y.p, tok_y.p+4);
(or it could possibly be .end, but I have not tested that)
The result of this is that str2int goes into fail mode, returning -1, which
means it is treated as uninitialized in the next round, which means it will be
set from time(), which has the (possibly wrong) device time and date.
I rebuilt with the patch attach and the issue goes away. I tested with an
additional logging entry that tested when year was changed and ran for a while.
The error did not show up in log again.
I think the bug is obvious and I do not think the patch has any side effects.
Patch should be applied to cupcake and master.
Original comment by LudwigBr...@gmail.com
on 18 Aug 2010 at 4:17
Attachments:
Hi Michael / Ludwig,
Thanks for doing the digging on this topic.
I've applied the patch to my local branches and it compiles without any issue.
Question: should we apply both patches
0001-fix-for-date-toggle-by-reading-beyond-end-of-year-fi.patch by Ludwig and
android-on-freerunner_139.patch by Michael? Or is the last patch overruling the
first?
Thanks for letting us know.
Niels.
Original comment by niels.he...@gmail.com
on 18 Aug 2010 at 7:35
Both patches. I started out from a version with Michael's patch already
applied. They are complimentary, fixing different problems.
Ludwig
Original comment by LudwigBr...@gmail.com
on 19 Aug 2010 at 12:29
Jim,
Could you review the patch by Ludwig and (if OK) apply to master and cupcake?
Thanks,
Niels.
Original comment by niels.he...@gmail.com
on 19 Aug 2010 at 7:06
Applied Ludwig's patch to cupcake and master.
Original comment by scarhill
on 21 Aug 2010 at 10:30
sorry for joining back in so late, I was on the road for a couple of days...
Seems Ludwig has found yet another bug, but there are some things in the NMEA
code that I'm still not quite happy with.
First, we give out timestamps even when we have only a partial timestamp (that
is, hh:mm:ss but no date) from the GPS and substitute the local system date.
While Ludwig's case (system clock nowhere near correct) may look like a highly
unlikely special case, the same situation may lead to problems when the system
clock is only slightly off and we get our first fix around midnight - we may
end up with an incorrect date. This may cause problems e.g. when we sync
against GPS time (as I suggested in ticket #67, waiting for the first GPS fix
and using its timestamp - the one timestamp we use for synchronization may be
faulty and we'd end up with an error of 24 hours rather then just a few
minutes).
Second, the lack of a mktime function capable of processing UTC input may still
have some side effects: we are working on time differences, which is correct in
most cases but may introduce errors in the immediate surroundings of a
DST/Standard Time switchover, resulting in timestamps being one hour (or
whatever the DST difference is) ahead or behind.
My solution to the first is not to return a date until we have received a full
GPS timestamp (may reduce startup time, typically by one second, but I don't
think this is very painful to the user).
For the second one, I have seen a in implementation of a mkgmtime function,
which converts a tm struct in UTC to a timestamp - basically mktime without any
time zone related processing. I would add that to the code and use it; there
are just two caveats:
Caveat 1: if the definition of time_t changes (likely as we approach the Y2K38
problem), we will need to adapt mkgmtime accordingly. Still I suppose the risk
is calculable.
Caveat 2: the code I'm referring to is originally from the Lynx browser and is
thus under the GPL. If we include this in our code, it would cause the entire
GPS driver code would fall under the GPL rather than the Apache license. For me
personally, that's fine (and I guess most people here would agree), but I would
still like some opinions on this.
Original comment by mich...@vonglasow.com
on 22 Aug 2010 at 1:01
Ludwig / Michael,
We really appreciate the analysis and contributions. We're very happy to see
that members of the community start submitting patches to the source code.
You guys obviously know what you're talking about and have the skills to fix
bugs. We therefore consider to give you commit rights to the source code. The
main constraint is that you have a local build environment and that you test
your patches before submitting / committing.
You can get in touch with Jim Ancona to do the configurations for you. In case
you want to discuss, feel free to mail or talk on IRC.
Original comment by niels.he...@gmail.com
on 22 Aug 2010 at 8:03
oh yes, another thing I noticed recently: the reported speed is roughly twice
the actual speed. The RMC sentence returns speed in knots (nautical miles per
hour, 1 knot equals slightly more than .5 m/sec), so I suppose the unit
conversion is missing and we're misinterpreting 10 knots as 10 m/sec. I'll take
a look also at that (it's not like this is the first GPS project to trip over
that sort of thing...)
Original comment by mich...@vonglasow.com
on 29 Aug 2010 at 3:22
Update: I have produced a first improved version in which I fixed the errors I
noticed:
- Timestamps are now calculated using mkgmtime and are thus immune to any
timezone quirks.
- nmea_reader_update_time will not set the time unless a valid date has been
set first. No system time will be used as a substitute for missing date/time
information.
- Only RMC and ZDA messages are used to update timestamps: these supply
fractional seconds and a date. (ZDA is rarely ever sent by GPS receivers, not
sure about the Freerunner, but RMC is *the* GPS message. See the rant on the
gpsd home page for more information.)
- Timestamps from GGA and GLL messages are ignored: They do not contain a date
and using them may cause system time to jump back by 24 hours and forward again
around midnight. I should add some extra code to ignore these messages
altogether until we got the first valid timestamp, else weird things may
happen...
- I also introduced conversion for the speed, as NMEA uses knots and Android
internally uses meters per second.
- Fractional seconds are still ignored but I'm planning to include them as well
- juts discarding the fractional part introduces an extra error of .5 +/- .5
seconds, that is, up to one second.
A quick test on my balcony looked OK (meaning I got a reasonable time), though
I could not test the speed for correctness.
Original comment by mich...@vonglasow.com
on 29 Aug 2010 at 7:02
Added the code to ignore GGA and GGL until the first timestamp has been
received. So far, so good.
But when I try to add millisecond precision to the fix time by changing
r->fix.timestamp = (long long)fix_time * 1000;
to
r->fix.timestamp = (long long)((fix_time * 1000) + (int)(fmod(seconds, 1) * 1000.0));
all of a sudden I get a weird time stamp - instead of 23:00 I get something
shortly after 13:00. (Unfortunately the GPS Status app does not show the date,
so I can't reliably tell the numeric error.)
I've tried various ways of adding the fractional second part to the time stamp
but they all have the same result - timestamps which are off by roughly the
same amount. What black magic did I fail to apply here?
Original comment by mich...@vonglasow.com
on 29 Aug 2010 at 9:13
for the impatient, here's the state of my work so far (I'm not set up for
commit access yet, will take care of that in the next days).
Should work but is not fully tested yet: Timestamps look OK and should in
theory be correct (we're now entirely free from system clock time), but doesn't
process fractional seconds yet. Speed should work in theory but is untested.
The archive contains the source patch as well as the updated library (for
immediate replacement on a Freerunner setup).
Input on the millisecond issue is still welcome.
Original comment by mich...@vonglasow.com
on 29 Aug 2010 at 10:04
Attachments:
The millisecond issue is solved: since time is parsed from the NMEA string and
then processed further, I dropped the black mathemagic altogether and instead
parse the seconds into two integer variables, one for seconds and one for
milliseconds. The seconds go into the tm structure for mkgmtime, fix time is
then returned as the result of mkgmtime, multiplied by 1000, plus milliseconds
as previously parsed. Tests look OK.
The last remaining issue might be that we may receive GGA or GLL messages
before the first RMC or ZDA, which will fill most fields with valid data before
we have a timestamp. If that's a problem, I am sure to find that out during my
efforts to fix #67 (as the planned app uses the timestamp of the first GPS
fix). If this behavior is an issue, then the only solution is to ignore any
messages (other than RMC and ZDA) until we have at least one valid timestamp.
As soon as I'm set up, I'll commit my changes.
Original comment by mich...@vonglasow.com
on 7 Sep 2010 at 9:18
Changes are committed to cupcake and master.
One issue remains, though it has no effect on execution (other than wasting a
few CPU cycles): the utc_diff member of the NmeaReader structure.
Once needed to convert local time (obtained from UTC hhmmss) to UTC,
gps_freerunner.c now does not use this member any more. Same about the
nmea_reader_update_utc_diff() function, whose only purpose is to set utc_diff.
They don't really do any harm, other than being a few superfluous lines of code
which take up space and CPU cycles and may confuse the reader.
Can we remove the utc_diff member from the structure and the corresponding
function from the code? Or is the NmeaReader structure exported to callers of
libfreerunner_gps.so, some of which might get confused if we removed a member
in the middle of the structure?
Original comment by mich...@vonglasow.com
on 10 Sep 2010 at 10:10
We may have another bug in the GPS layer somewhere...
Software used:
Cupcake 0.2.0 RC2
OSMTracker for Android 0.5.1
GPS Status
Other devices:
Canon Digital Ixus 70 (digital camera)
GeeksPhone One (other Android phone)
Mitac Mio 168 RS (PDA with GPS)
When I go mapping, I first fire up GPS Status, get a fix, then take a photo of
the display (showing GPS time) with my camera. Then I start a GPS track using
OSM Tracker and at the same time take photos.
At home, I georeference pictures using JOSM and the "drift" between the time
shown on the GPS status display and the EXIF timestamp of the corresponding
image.
If I don't touch the clock of the camera, it's normal for the drift to increase
or decrease by, say, a second per week. This is the case for the Geeksphone as
well as for the Mio. Also, the drift should be independent of the GPS device
used.
With the Freerunner, however, I notice that the drift may differ by something
in the 20-second range from the drift encountered with the Geeksphone. On
another day, the drift is the same for both.
Also with the Freerunner, sometimes I sync the time as described above, and
when I examine the photos, they appear somewhat further "down the road" than
where I took them.
All of this seems like the GPS time reported by the Freerunner occasionally
skips back (and forward again) by a few seconds.
I have attached an extract from a problematic track, together with screenshots
from JOSM visualizing it: In the "dots" file, it looks just like a normal
track. But in the "linked" file, you will see the line bouncing back and forth
in a certain region - indicating that the points were not stored in the correct
order.
Looking at the GPX file, you will see a 14-second range (from 6:43:40 to
6:43:54) with *two* positions per second. (OSMTracker stores positions to a
SQLite db first, then exports to GPX from that DB - most likely sorting points
by their timestamp.)
This phenomenon can be explained by GPS time skipping back by 14 seconds (after
6:43:54, time seems to have gone back to 6:43:40, causing the 14-second
timespan to appear twice.)
Not sure what the issue is this time... it might be that we're flipping between
GPS time and UTC - the offset currently is 15 seconds (since the introduction
of the last leap second on Dec 31, 2008), which fits the pattern (after
6:43:54, rather than switching to 6:43:55, the clock goes to 6:43:40, which is
just that 15-second difference). I'll look into it, but any input is welcome...
Original comment by mich...@vonglasow.com
on 1 Mar 2011 at 9:39
Attachments:
Did some initial research, and the document at
http://gpsd.berlios.de/NMEA.txt
(in the Dates and times section) mentions that the current GPS/UTC time offset
gets transmitted roughly one per 20 seconds, and a cold-booted GPS may have
incorrect time until this information arrives.
Since UTC occasionally gets leap seconds, it is somewhat "slower" than GPS
time, in other words, GPS time is ahead of UTC. So a receiver suddenly
switching from GPS time to UTC would jump back - the very behavior we're seeing
above. This would occur at most 20 seconds into the track. (My first trackpoint
was at 6:41, two minutes before the pattern shown above occurred.)
So this seems to be the issue - we need to find out whether our GPS receiver is
reporting UTC or GPS time and correct accordingly.
Original comment by mich...@vonglasow.com
on 1 Mar 2011 at 10:02
Digging more deeply into this... things may get ugly.
Apparently there is no NMEA message reporting the current difference between
UTC/GPS time.
The documents on the NMEA protocol that I could find all talk of UTC
timestamps, which I take to mean that the GPS receiver should correct for the
drift internally and return neat UTC time stamps, such that the application
would never see a "raw" GPS timestamp or need to deal with the offset - unless
the receiver has just cold-booted, obtained a fix but has not yet seen the
leap-second correction.
I have also looked at the documentation of the ANTARIS chip used in the
Freerunner - no mention of leap second correction, not even in the proprietary
messages.
The proprietary u-blox protocol provides a way to report back leap seconds and
a flag showing if they are already known. If we can get the chip to talk binary
to us (and process the information), we could examine the value and store the
leap seconds, once we got them. On each subsequent startup, we could check
again and manually "correct" timestamps with the previously saved leap seconds
as long as the receiver does not have them. The main issue is to communicate in
binary...
Original comment by mich...@vonglasow.com
on 2 Mar 2011 at 12:28
Had a look at gpsd, which features a full u-blox protocol driver. The GPS can
be configured to output both UBX and NMEA at the same time. Since each format
has its own sync characters to start each frame ($ for NMEA, 0xB5 0x62 for
UBX), it's possible to tell which message format is coming down the wire. We
just need to examine the NAV-TIMEGPS (0x01 0x20) message, which tells us
whether the receiver already knows the leap seconds, and how many they are.
Plan:
- When we get the leap seconds for the first time, we write them to a file )and
additionally keep them in memory) and set a flag.
- When the leap seconds change (because a new leap second was added), we update
the file and memory information.
- On subsequent startups, when the leap seconds are not known (yet), we read
information from the file (without setting the flag) and correct timestamps
manually.
- As soon as we get the leap seconds again, we set the flag and stop applying
the correction.
Thus the 15-second leap will manifest itself only once, when we start up the
unit for the first time. If we start up the unit for the first time after a new
leap second got added, we will get similar behavior (but the leap is smaller,
equal to the number of leap seconds since the unit was last turned on). Also,
if the receiver is on "during" a leap second, it may not get the update for up
to 20 minutes and will then leap back by one second - this is a limitation of
GPS itself.
Altogether, if you use your GPS regularly (at least once every 18 months or
so), and if you leave it on for 20 minutes when using it for the very first
time, this should make the problem near-negligible.
Original comment by mich...@vonglasow.com
on 2 Mar 2011 at 8:35
Original issue reported on code.google.com by
mich...@vonglasow.com
on 24 Apr 2010 at 7:04