Closed tshaug closed 3 years ago
Even though I think it shouldn't make a difference, it is clearly not necessary to recreate the device each time. You should be able to talk to different devices simultaneously (at least as long as you are performing everything on the same thread - not sure about thread safety of these classes)
Hi Patrick,
thanks for your comment. So I will change my code accordingly. BTW: I tested today with a RaspPi 3B, the initial test has been with RaspPi 4 : both create such exceptions (I have not expected differently, but who knows...). (I also replaced the sensors)
Cheers Thomas
I don't have this exact sensor module, but I've done quite extensive tests on I2C (i.e with high-troughput data to LCD displays) and never seen any exceptions (unless I disconnect the bus). I need to do some more I2C tests soon, so maybe I can reproduce it.
Hello @tshaug thanks for logging this issue! Your code looks correct to me, and to answer your other question:
I am creating the I2C device object every 5 seconds, because I have to create antoher I2c device object with different settings (for BME280) and I don't know if it is ok to have multiple at the same time.
You don't need to create an I2c every time, as you are allowed to have more than one at the same time. Are you sure this device is connected correctly and in the correct address? In order to quickly test if this is the problem, run the following command from your terminal window: i2cdetect 1
and see if you can spot your device connected at i2CAddress
. This would be the first step in order to better diagnose what the problem is here.
Hi @joperezr thanks you for your feedback.
I have executed i2detect 1: pi@ThomasRaspPi:~ $ i2cdetect 1 WARNING! This program can confuse your I2C bus, cause data loss and worse! I will probe file /dev/i2c-1. I will probe address range 0x03-0x77. Continue? [Y/n] Y 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 50: -- -- -- -- -- -- -- -- -- -- 5a -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- 76 --
So it is in my opinion returning the correct values (90 and 118 decimal).
As I have written I tested with different sets of sensors and RaspPis and it happens in all combinations. And it only happens sporadically
Cheers Thomas
And so I suppose that in your code, the variable i2cAddress is set to either 0x5a
or 0x76
?
Do you have other i2c devices that you can try on the same bus just to make sure that this is not a problem with your bus? (the fact that you have tested in multiple RPis makes me think this is not the problem)
@ZhangGaoxing would you know what might be going on? this sounds like something with the binding specifically as if it was something with i2cDevice I believe we would have seen this already for other devices.
MLX90614 default I2C address is 0x5A
. And it seems that i2cdetect has detected the corresponding address. Is your I2cDevice parameter set correctly?
Oh, I see. MLX90614 is an SMBus device. Try this to set up your Raspberry Pi https://www.raspberrypi.org/forums/viewtopic.php?f=44&t=15840&sid=8c2fff5ce4395676800b4587f2a71b4e&start=25 I mentioned this in #452
Hi @ZhangGaoxing sorry for the late reply. thank you for your explanation. I will check the provided links. And test with my RaspPis
Cheers Thomas
HI @ZhangGaoxing ,
I looked into the post and also at my RaspPi:
To me it seems that my Raspi has a bmc2835, which does not understand the mentioned combine file.
Is my understanding of #452 correct that with bmc2835 I don't need to configure anything specific? and the sensor binding should work out of the box. Because I still sporadically receive these errors.
Sorry maybe I am confused by the mentioned tickets.
Thanks, Thomas
I have seen this error now also once, while reading from the I2C bus (from an ADS1115, to be precise) at a high frequency. Looks a bit like this error can happen if reading very quickly from the bus. I don't have a way of forcibly reproducing it yet, but I might try later.
Hi Patrick,
thanks for the info. at the moment I am reading every 5 seconds. At which frequency are you reading from the bus?
cheers Thomas
When it happened, I was reading a short value with at least 1kHz. But it happened only once, and I was trying even higher read rates.
interesting...
@pgrawehr, @tshaug any chance you can try out running i.e. in a loop outside/inside of the using statement and see if there is something you can make to repro this?
Perhaps there are some docs which can explain when can this fail or we could add some diagnostics to help us figure out if this is something we do incorrectly or i.e. driver issue.
Are you guys up to date with kernel updates for your raspbian? (check version before and after you update and try reproing again perhaps)
Will do, but I'll probably only find time on Saturday. It's of course possible that this error 121 just happens occasionally when a transmission error occurs, since the exact same error happens if you just disconnect one of the I2C wires.
@pgrawehr that's possible - no hurry since this doesn't seem to be consistently broken and rather occasionally fails. If you find that this is just transmission error we need to figure out what happens with corrupted frame and if this is something we should be retrying on our side (only if it's safe to retry) or let the error bubble up and let the user decide
Hi everybody, sorry for the late replay (I have been busy demonstration my sensor prototype and code at a .Net conference ;-)) I have found one issue with my software how i am accessing the Mlx90614 temperature values:
protected override Mlx90614SensorData GetSensorDataInternal()
{
Iot.Units.Temperature irtemperature = sensor.ReadObjectTemperature();
Iot.Units.Temperature ambientTemperature = sensor.ReadAmbientTemperature();
logger.Debug($"Reading Mlx90614 temperatures done");
return new Mlx90614SensorData(SensorDataQuality.Good, irtemperature.Celsius,
ambientTemperature.Celsius);
}
I discovered that always the readAmbientTempature() call failed. So to me it seems that this is kind of a high frequence call as @pgrawehr did, because it is the second call immediately after the first to the sensor I have now changed to have a 200ms delay between the two calls to the sensor .
This at least eased the situation: during the test session only two times an exception occured. This is not super perfect but ok for me.
If you like we can close the topic.
Cheers and thanks again Thomas
@tshaug
I discovered that always the readAmbientTempature() call failed
Do you mean it failed with error 121 on the transmission level? I'm wondering if this is sensor specific (i.e. still processing previous requests and not giving ACK on the line) or something else. Considering the delay is fixing the issue this sounds like sensor specific problem... If that's the case I'm voting this is Mlx90614 bug (vs I2cDevice bug as title suggests)
When I tried again last week, I ran I2C transfer operations for several hours at a very high rate (basically a ReadValue in an infinite, untimed loop). I was not able to reproduce the problem. So it may really be a sensor-specific issue with a missing ACK or something and so is an intermittent issue.
The question is whether we should internally handle this problem with a few retries or let the user handle it?
@tshaug If you are able to reproduce the problem consistently, can you try whether a retry works or whether the bus is in some undefined state after this exception?
@pgrawehr we should start with digging in the spec if there is something we can do to handle this gracefully but if there is nothing in there I suggest we dig more into why the tiny delay is making the reading more reliable - perhaps there is some delay which can get us to close to 100% correctness - if still can't get there then retries are ok I guess...
@pgrawehr: in my option a retry should work (at least I am able to "re-use" the bus/sensor 5 seconds later). I will test with the following code:
`protected override Mlx90614SensorData GetSensorDataInternal() {
Iot.Units.Temperature irtemperature = sensor.ReadObjectTemperature();
// I sometimes receive = (zero) values, let add some delay before reading ambient temperature
Task.Delay(TimeSpan.FromMilliseconds(200)).Wait();
Iot.Units.Temperature ambientTemperature;
try
{
ambientTemperature = sensor.ReadAmbientTemperature();
}
catch (IOException ioException)
{
logger.Debug($"Reading Mlx90614 temperatures retry: {ioException.Message}");
// see:https://github.com/dotnet/iot/issues/832
Task.Delay(TimeSpan.FromMilliseconds(100)).Wait();
ambientTemperature = sensor.ReadAmbientTemperature();
}
logger.Debug($"Reading Mlx90614 temperatures done");
return new Mlx90614SensorData(SensorDataQuality.Good, irtemperature.Celsius,
ambientTemperature.Celsius);
}`
But only this evening. I will let you know about the results afterwards
Cheers Thomas
Yesterday I did some extensive testing. As mentioned I implemented a simple retry mechanism in my Sensor client. I ran the sensor app and it's client for more than 8 hours: (object temperature and ambient temperature - The peaks are when I held my hand right before the Mlx90614 sensor)
So everything works quite well. I later on analysed the logs which I write at the Rasp Pi. Here I discovered 3 errors (sorry time and date of rasp Pi is wrong - I didn't realized when I started the test session): 1) one time my retry failed (= 2 consecutive errors) : 2019-12-01 20:17:37.839 +01:00 [DBG] Start using Mlx90614Reader 2019-12-01 20:17:39.086 +01:00 [DBG] Reading Mlx90614 ambient temperature retry: Error 110 performing I2C data transfer. 2019-12-01 20:17:39.193 +01:00 [ERR] Reading Mlx90614 ambient temperature retry also failed
2) Mlx90614 sensor fails one time while reading Object temp (which I have not seens so far - no retry at the moment implemented:
2019-12-01 20:17:45.292 +01:00 [INF] Error while using Mlx90614Reader: Error 121 performing I2C data transfer.
System.IO.IOException: Error 121 performing I2C data transfer.
at System.Device.I2c.UnixI2cDevice.ReadWriteInterfaceTransfer(Byte writeBuffer, Byte readBuffer, Int32 writeBufferLength, Int32 readBufferLength)
at System.Device.I2c.UnixI2cDevice.Transfer(Byte writeBuffer, Byte readBuffer, Int32 writeBufferLength, Int32 readBufferLength)
at System.Device.I2c.UnixI2cDevice.WriteRead(ReadOnlySpan1 writeBuffer, Span
1 readBuffer)
at Iot.Device.Mlx90614.Mlx90614.ReadTemperature(Byte register)
at Iot.Device.Mlx90614.Mlx90614.ReadObjectTemperature()
at Herzonaut.ObservingConditions.Raspi.Sensor.Mlx90614Reader.GetSensorDataInternal() in C:\d\dn\Herzonaut\git\master\Herzonaut.ObservingConditions.Raspi.Sensor\Mlx90614Reader.cs:line 27
at Herzonaut.ObservingConditions.Raspi.Sensor.AbstractI2CSensorReader`1.GetSensorData() in C:\d\dn\Herzonaut\git\master\Herzonaut.ObservingConditions.Raspi.Sensor\AbstractI2CSensorReader.cs:line 38
3) BME280 (which I use to messure temp (as well) and pressure & Humidity) also fails one time with the same error: 2019-12-01 20:17:40.209 +01:00 [INF] Error while using Bme280Reader: Error 121 performing I2C data transfer. System.IO.IOException: Error 121 performing I2C data transfer. at System.Device.I2c.UnixI2cDevice.ReadWriteInterfaceTransfer(Byte writeBuffer, Byte readBuffer, Int32 writeBufferLength, Int32 readBufferLength) at System.Device.I2c.UnixI2cDevice.Transfer(Byte writeBuffer, Byte readBuffer, Int32 writeBufferLength, Int32 readBufferLength) at System.Device.I2c.UnixI2cDevice.WriteByte(Byte value) at Iot.Device.Bmxx80.Bmxx80Base.Read8BitsFromRegister(Byte register) at Iot.Device.Bmxx80.Bmxx80Base.SetTemperatureSampling(Sampling sampling) at Herzonaut.ObservingConditions.Raspi.Sensor.Bme280Reader.GetSensorDataInternal() in C:\d\dn\Herzonaut\git\master\Herzonaut.ObservingConditions.Raspi.Sensor\Bme280Reader.cs:line 23 at Herzonaut.ObservingConditions.Raspi.Sensor.AbstractI2CSensorReader`1.GetSensorData() in C:\d\dn\Herzonaut\git\master\Herzonaut.ObservingConditions.Raspi.Sensor\AbstractI2CSensorReader.cs:line 38
To me the thrid error is the most interesting one. It seems that for the sensors I use sometimes this Error 121 is happening. But the system is still working afterwards so it is not a big deal for me.
@tshaug, the BME280 error is in fact interesting. I have couple lying around and one of them connected all the time and haven't seen any issue so far.
Do you have both sensors connected to the same PI at the same time? Is it possibly related with them reading/writing at the same time? Wondering if this isn't some I2C threading issue which we should fix on our side
I'd like to preface my comment with the fact that I am very new to i2c, raspbian and the MLX90614.
I have seen this issue as well, exactly as described above. When querying very fast to the sensor, the 121 exception is thrown occasionally. I would like to add, however, that I can repro this in Python using the smbus
library. I don't know how similar this is to System.Device.I2c.
This python script can repro the error. With a quick test it failed 43 out of 50 times, 7 times it worked and returned expected data:
import smbus
BUS = smbus.SMBus(1)
DEVICE_ADDRESS = 0x5a
temp = BUS.read_word_data(DEVICE_ADDRESS, 0x07) * 0.02 - 273.15
emis = BUS.read_word_data(DEVICE_ADDRESS, 0x24) / 65535
print(temp)
print(emis)
The error only ever occurs on the second read, and testing with a longer script over 20000 iterations the failure rate was 70 %.
>>> %Run mlx.py
Traceback (most recent call last):
File "/home/pi/Documents/mlx.py", line 5, in <module>
emis = BUS.read_word_data(DEVICE_ADDRESS, 0x24) / 65535
OSError: [Errno 121] Remote I/O error
With only a print statement in between the reads I didn't see a failure over many thousands of iterations:
import smbus
BUS = smbus.SMBus(1)
DEVICE_ADDRESS = 0x5a
i=0
while True:
temp = BUS.read_word_data(DEVICE_ADDRESS, 0x07) * 0.02 - 273.15
print(i)
emis = BUS.read_word_data(DEVICE_ADDRESS, 0x24) / 65535
i += 1
My setup is:
I am leaning towards the issue being with the sensor not being able to keep up, but my knowledge here is very limited.
Ok, We're likely gonna start seeing this on CI so might be worth at least wrap the exception in something more convenient... Perhaps we should at least throw some exception type we could use for retry (i.e. I2cException or something)... other option perhaps could be ProtocolException - or do we leave it as is? cc: @joperezr
We could probably eithe3r Wrap the Exception in order to provide a better message, but usually when you get one of these it means that the comunnication on the I2c Bus can't find the sensor, so retrying won't really help at all. In CI for example, I expect that once we see this in one of the devices, it will fail the test every single run on that machine until we go to the lab and re-connect the sensor correctly.
@joperezr Unfortunatelly, it's not that easy. You are right that error 121 hapens when the device does not answer, but as we've seen from several reports now (mine included) it can also happen intermittently. For reasons that are not exactly understood yet, sometimes the error pops up after the system had been running fine for minutes or even hours and goes away again as it came.
Just made an interesting observation. I was seeing quite a lot of these errors with a particular ADS1115. There are a total of 7 chips connected to this bus (two ADS1115, two MCP23017, a BMP280, a BME680 and an LCD display). The problematic ADS, that I'm reading at about 1Hz, reported failures about every 2nd or 3rd attempt, the second ADS, that I'm reading at 5Hz, reported failures about every 100th time, all the other sensors very rarely had errors, even though especially the display is written at high rates. Replacing the pressumably broken ADS didn't help, but disabling it in the software (so only using all the other devices) fixed all problems, including the sporadic errors the other sensors had. So I assumed that the device might somehow interfere on the bus with the other devices. And the only way this can normally happen is if the devices don't use the correct device addresses. -
The ADS1115 comes on a breakout board with a pull-down resistor on the ADDR line, which defines the address to use (0x20 - 0x24 for this chip). Leaving it externally open normally puts the address to 0x20, but apparently not reliable enough. Noise can interfere with the input (likely because the pin can be connected to SDA or SCL to get a different address), it seems. Bottom line: Don't leave the address pins of any I2C device open, even if they're equiped with pullups or pulldowns on a breakout. Hardwiring the line to ground fixed all the issues - or lets say improved it significantly, I haven't run it for long enough yet.
I'm not sure what to do about this issue other than to perhaps add optional retry logic...
Yea, I guess all one can do is add retries (this works fine for me, even though I still have plenty of these errors happening). We could consider an auto-retry feature.
In all the sensors I have, I always add retry and overall catch mechanism, data cleaning. Nothing is perfect, errors can come "from the wires", from the sensor itself, software and no measurement is never ever correct anyway. They are all approximation of the real world :-)
True... We could either add auto-retry or at least update the documentation (which one...?) to make clear that these exceptions can and will ocassionally happen. It seems that the issue is mostly about getting such an exception after an application ran flawlessly for hours.
We could either add auto-retry
Some bindings where it happens often already have those in place. So I won't over do it.
at least update the documentation (which one...?)
Yes, this is clearly what we can do. I would say in the main binding page. I would say, right after the Binding Distribution section. something like good practices when working with embedded devices:
perhaps since we consider errors as something normal we might want to consider adding TryWrite/TryRead methods to avoid try catches. I think the right place to put this would be once we have raspi-spi.md
file similar to pwm version
Will close this issue as we've added documentation on this behavior. Also we've added specific documentation on spi and I2C as well to enable them. And mentioned the retry mechanism. Feel free to reopen if needed.
Hi everybody,
I am using a Mlx90614 sensor and sporadically receive an exception running on RaspPi. This issue seems to be similar to #163
My code looks like this:
I am creating the I2C device object every 5 seconds, because I have to create antoher I2c device object with different settings (for BME280) and I don't know if it is ok to have multiple at the same time.
Expected behavior no error ;
Actual behavior sporadically I receive the following exception: 2019-10-30 22:33:14.869 +01:00 [INF] Error while using Mlx90614Reader: Error 121 performing I2C data transfer. System.IO.IOException: Error 121 performing I2C data transfer. at System.Device.I2c.UnixI2cDevice.ReadWriteInterfaceTransfer(Byte writeBuffer, Byte readBuffer, Int32 writeBufferLength, Int32 readBufferLength) at System.Device.I2c.UnixI2cDevice.Transfer(Byte writeBuffer, Byte readBuffer, Int32 writeBufferLength, Int32 readBufferLength) at System.Device.I2c.UnixI2cDevice.WriteRead(ReadOnlySpan
1 writeBuffer, Span
1 readBuffer) at Iot.Device.Mlx90614.Mlx90614.ReadTemperature(Byte register) at Iot.Device.Mlx90614.Mlx90614.ReadAmbientTemperature() at Herzonaut.ObservingConditions.Raspi.Sensor.Mlx90614Reader.GetSensorDataInternal(I2cDevice i2cDevice) in C:\d\dn\Herzonaut\git\master\Herzonaut.ObservingConditions.Raspi.Sensor\Mlx90614Reader.cs:line 30 at Herzonaut.ObservingConditions.Raspi.Sensor.AbstractI2CSensorReader`1.GetSensorData() in C:\d\dn\Herzonaut\git\master\Herzonaut.ObservingConditions.Raspi.Sensor\AbstractI2CSensorReader.cs:line 39Versions used System.Device.Giop 1.0.0 Iot.Device.Bindings 1.0.0
Add following information:
dotnet --info
on the machine being used to build .NET Core SDK (reflecting any global.json): Version: 3.0.100 Commit: 04339c3a26Runtime Environment: OS Name: Windows OS Version: 10.0.17763 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\3.0.100\
Host (useful for support): Version: 3.0.0 Commit: 7d57652f33
.NET Core SDKs installed: 2.1.701 [C:\Program Files\dotnet\sdk] 2.1.801 [C:\Program Files\dotnet\sdk] 2.1.802 [C:\Program Files\dotnet\sdk] 2.2.301 [C:\Program Files\dotnet\sdk] 2.2.401 [C:\Program Files\dotnet\sdk] 2.2.402 [C:\Program Files\dotnet\sdk] 3.0.100 [C:\Program Files\dotnet\sdk]
.NET Core runtimes installed: Microsoft.AspNetCore.All 2.1.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.1.13 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.All 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.1.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.1.13 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
dotnet --info
on the machine where app is being run (not applicable for self-contained apps) .NET Core SDK (reflecting any global.json): Version: 3.0.100 Commit: 04339c3a26Runtime Environment: OS Name: raspbian OS Version: 10 OS Platform: Linux RID: linux-arm Base Path: /home/pi/astro/dotnet3/sdk/3.0.100/
Host (useful for support): Version: 3.0.0 Commit: 7d57652f33
.NET Core SDKs installed: 3.0.100 [/home/pi/astro/dotnet3/sdk]
.NET Core runtimes installed: Microsoft.AspNetCore.App 3.0.0 [/home/pi/astro/dotnet3/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 3.0.0 [/home/pi/astro/dotnet3/shared/Microsoft.NETCore.App]
To install additional .NET Core runtimes or SDKs: https://aka.ms/dotnet-download
System.Device.Gpio
: 1.0.0Iot.Device.Bindings
package 1.0.0