Openvario / sensord

Daemon to poll air data from Openvario sensorboard
6 stars 11 forks source link

EEPROM issues #24

Closed hpmax closed 3 years ago

hpmax commented 4 years ago

The behavior on the initial read is different whether I'm using the -f switch or not.

When I use -f, it reads data out, but claims the checksum fails.

When I don't use -f, it reads what appears to be garbage data out, checksum passes and it uses it, resulting in a zero_offset of 8.1e36 or something like that. Interestingly enough, I modified the code to set zero_offset to 1.3 immediately prior to reading the EEPROM. I was trying to determine if the read failed and it was just what was in memory at that location, or if that was really the data that it read out. However, after adding data.zero_offset immediately prior to the eeprom_read_data, the eeprom_read_data function returned the correct information with a proper checksum and my dynamic pressure was zero'ed out.

kedder commented 4 years ago

The way eeprom is read has nothing to do with -f switch. The -f switch disables daemon forking to the background, and that's it. Do you have your sensors calibrated? What is the actual problem you are experiencing?

hpmax commented 4 years ago

Looking at the source code, I tend to agree. Looking at the program's behavior, I tend to disagree. I can't replicate what I was seeing before.

What I am seeing now is that if I use "-f" I get a dynamic offset of 0.297551, which is approximately correct, and I believe is the stored value. If I don't use "-f" I get a value of 0. I don't have an explanation for this behavior, but it's the reality of the situation.

hpmax commented 4 years ago

When it goes to read out the EEPROM it does the following: eeprom_open and then eeprom_read_data

eeprom_read_data in turn calls: eeprom_read and then verify_checksum

eeprom_read writes a byte (which I think is 0x00) to the i2c bus and then reads the data out over the i2c bus.

eeprom_read seems to return a 1 on this:

if ((write(eeprom->fd, &offset, 1)) != 1) { printf("Error writing to i2c slave (%s)\n", func); return(1); }

Oddly the: if ((write(eeprom->fd, (void*)&offset, 1)) != 1) { // Send register we want to read from
in eeprom_open does seem to work.

kedder commented 4 years ago

So, do you have your sensors calibrated?

Anyways, you realize that -f switch should not be relevant to the issue (even though it appears to be). As well as type casts shouldn't be relevant (even though it appears to be). The finger you press "enter" with to run sensord should not be relevant either (but it appears to be). You can arrive at pretty wild conclusions if you accept one of these as a "cause".

I suspect there is no magic happening here. t_eeprom_data.zero_offset is simply never initialized, so it effectively contains a random value every time you run sensord. Could that explain the phenomenon you observe?

hpmax commented 4 years ago

@kedder That's not what's happening. It's possible the 8.1e36 issue was an uninitialized EEPROM. I have run sensorcal since.

When I run with -f, it reliably comes up with the same value of .297551, which is correct -- or close to it. When I run without it, it fails to readout the EEPROM and initializes as zero. I know the specific statement which it is failing on...

This is 100% repeatable.

"write(eeprom->fd, &offset, 1)" within eeprom_read is not returning a 1 without the "-f" switch. I presume it's returning a zero. This is there to execute the command to read the EEPROM. Oddly, there is a very similar write command in eeprom_open that does return 1.

I have no explanation for why it works with -f, and not without it. However, based on issue #23 I think there is reason to suspect there are timing differences, even though I don't see why there should be.

kedder commented 4 years ago

How can I reproduce this?

hpmax commented 4 years ago

From the command line, run:

/opt/bin/sensord -s -c /opt/conf/sensord.conf -f -r working /opt/bin/sensord -s -c /opt/conf/sensord.conf -r not_working

Take a look at the dynamic output in working/not_working, which should be the 3rd column, you should see the one in working with the offset working (i.e. it will be close to zero). The one in not_working will not have the offset working and so the value will be off (since it is generally negative, it would normally be filtered out when not moving so it'd be hard to see this on the ground).

The offset will be printed out in the "-f" version, but won't be in the daemonized version because it forks before the eeprom read.

kedder commented 4 years ago

Ok, when I run sensord in foreground (with -f), I get this output:

Opened 24C16 on 0x50
Using EEPROM calibration values ...
=========================================================================
Runtime Configuration:
----------------------
Vario:
  Kalman Accel: 0.300000
Sensor TEK:
  Offset:   0.000000
  Linearity:    1.000000
Sensor STATIC:
  Offset:   0.000000
  Linearity:    1.000000
Sensor TOTAL:
  Offset:   0.210574
  Linearity:    1.000000
=========================================================================

When I run it without -f, sensord writes output to sendord.log, which looks like this:

EEPROM Checksum wrong !!
=========================================================================
Runtime Configuration:
----------------------
Vario:
  Kalman Accel: 0.300000
Sensor TEK:
  Offset:   0.000000
  Linearity:    1.000000
Sensor STATIC:
  Offset:   0.000000
  Linearity:    1.000000
Sensor TOTAL:
  Offset:   0.000000
  Linearity:    1.000000
=========================================================================

The latter has "EEPROM Checksum wrong !!" message and Sensor TOTAL Offset is 0.

Is this what you are talking about?

hpmax commented 4 years ago

That's exactly what I'm talking about.

hpmax commented 4 years ago

Any luck on getting to the bottom of this?

kedder commented 4 years ago

No, haven't looked at it yet, but it seems reproducible.

hpmax commented 4 years ago

I keep thinking this may be a timing issue, the same timing issue that made the "blips" worse when not using the -f command line option. But I still don't understand why because I can't imagine what could be causing it. Maybe one of the signals that is setup after it forks?

kedder commented 4 years ago

I finally got some time to look into this.

When sensord is started as a daemon (without -f flag), it closes the stdout (close(STDOUT_FILENO);). Then eeprom_open is executed, but it still writes to standard output using printf. These printf do not fail, because file descriptor 1, normally associated with stdout is reserved by open("/dev/i2c-1", O_RDWR); just before the printf.

So effectively this statement writes to /dev/i2c-1, which makes subsequent eeprom_read to fail:

    printf("Opened 24C16 on 0x%x\n", i2c_address);

Does that make sense?

hpmax commented 4 years ago

So basically, you're saying we need to be very careful about printf or fprintf(stdout), as they will write to the I2C bus instead of the terminal/log when not in foreground mode? Sounds like we need to figure out a solution to that (either only do them when in foreground mode, or write to a log instead when not in foreground mode, or just write to a log only), and then go through all the code and make sure that sort of thing isn't present elsewhere, since I assume that has the potential to affect not just the EEPROM read? Or is it confined to that one IC bus file descriptor because it's the next one set after closing the stdout?

kedder commented 4 years ago

@hpmax, posted #27 with the fix.