Closed michaellass closed 3 years ago
@darton Did I describe your issue correctly with the title of this bug report? I have never tested such a use-case. If I understand correctly, something like the following should reproduce your issue?
#!/usr/bin/env python3
import adafruit_dht
import board
import sys
import time
pin = board.D2 # adjust to correct PIN
dht_device = adafruit_dht.DHT22(pin)
# With Adafruit_CircuitPython_DHT 3.5.1 the CPU will now just idle until it's actually used
# During this sleep libgpiod_pulsein will start taking 100% CPU
time.sleep(12 * 3600) # Sleep for 12 hours
# ... and subsequent readings will fail:
try:
temperature = dht_device.temperature
humidity = dht_device.humidity
except RuntimeError:
print("Error while getting sensor data")
sys.exit(1)
print("Temp: {}, Humid: {}", temperature, humidity)
I read the sensor every 5 seconds. Problems begin after a few hours, but sometimes much earlier, e.g. 30 minutes, 1 hour.
The DHT22 sensor is connected to the Raspberry Pi using a 50 cm shielded cable
Thanks for the clarification. Unfortunately that means that I probably won't be able to reproduce your issue here. I am querying my sensor every minute for months now without any similar issue. Possible differences between our setups:
Are you willing to invest some effort into debugging this? If so, please follow the following steps and post the output created by the final gdb command:
apt install gdb git build-essential libgpiod-dev
git clone https://github.com/adafruit/libgpiod_pulsein.git
cd libgpiod_pulsein/src
# modify Makefile: Add " -g" at the end of line 2 (defining CFLAGS). Then run:
make
# While your sensor polling script is running, figure out where libgpiod_pulsein is installed:
ls -lad /proc/$(pidof libgpiod_pulsein)/exe
# result may be sth like /home/youruser/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/libgpiod_pulsein
# or something in /usr/local/...
# Stop your sensor polling script. Then:
cp libgpiod_pulsein WHEREVER_YOUR_libgpiod_pulsein_IS_LOCATED
# Start your polling script again. Wait until it starts to consume 100% CPU and stops returning any sensible information. Then run:
gdb attach $(pidof libgpiod_pulsein) -ex "thread apply all bt" -ex "detach" -ex "quit"
The output of the final gdb command should be very helpful in debugging this. Maybe run the last command multiple times to see if it gives you different outputs. Then all of those might be of interest.
sudo gdb attach $(pidof libgpiod_pulsein) -ex "thread apply all bt" -ex "detach" -ex "quit" GNU gdb (Raspbian 8.2.1-2) 8.2.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "arm-linux-gnueabihf". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.
For help, type "help". Type "apropos word" to search for commands related to "word"... attach: No such file or directory. Attaching to process 19266 [New LWP 19267] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". __libc_msgrcv (msgflg=1, msgtyp=1, msgsz=4095, msgp=0x7ed47bbc, msqid=163840) at ../sysdeps/unix/sysv/linux/msgrcv.c:28 28 ../sysdeps/unix/sysv/linux/msgrcv.c: No such file or directory.
Thread 2 (Thread 0x76d93460 (LWP 19267)):
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 0x76f76450 (LWP 19266)):
Detaching from program: /home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/libgpiod_pulsein, process 19266 [Inferior 1 (process 19266) detached]
19266 pi 20 0 10324 428 316 R 100.0 0.0 81:55.23 libgpiod_pulsei
Thread 2 (Thread 0x76d93460 (LWP 19267)):
0 __lll_lock_wait (futex=futex@entry=0x24084 , private=0) at lowlevellock.c:46
1 0x76eeaf44 in __GI___pthread_mutex_lock (mutex=0x24084 ) at pthread_mutex_lock.c:80
2 0x000116dc in polling_thread_runner ()
Looks like some of the steps above did not really work out as we are not getting the exact code position in libgpiod_pulsein.c
here. However, It's very likely this one: https://github.com/adafruit/libgpiod_pulsein/blob/master/src/libgpiod_pulsein.c#L477. If not, it's https://github.com/adafruit/libgpiod_pulsein/blob/master/src/libgpiod_pulsein.c#L525.
3 0x76f77968 in __pointer_chk_guard () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 1 (Thread 0x76f76450 (LWP 19266)):
0 __libc_msgrcv (msgflg=1, msgtyp=1, msgsz=4095, msgp=0x7ed47bbc, msqid=163840) at ../sysdeps/unix/sysv/linux/msgrcv.c:28
1 __libc_msgrcv (msqid=163840, msgp=0x7ed47bbc, msgsz=4095, msgtyp=1, msgflg=0) at ../sysdeps/unix/sysv/linux/msgrcv.c:24
2 0x00010f28 in main ()
Apart from the missing line numbers this looks exactly how it should look like:
__libc_msgrcv
, waiting for a message via IPC from python. This should be a blocking call and therefore it should not consume any CPU. [Side note: I'm note sure why #0
shows the same call as #1
but with a different order of arguments and in particular with msgflg=1
. However, this is what I see here as well with 0% CPU use.]pthread_mutex_lock
which is blocking as well, i.e. it should consume no CPU. We are not in one of the pthread_mutex_trylock
here which would have explained the CPU usage.To be honest I'm lacking a good explanation for the CPU consumption you are seeing. It could be that the message queue becomes invalid, such that msgrcv
becomes non-blocking. I have to think about this and will try to come up with some debugging strategy...
To check whether the msgrcv
is the culprit here, could you please run the following commands whenever your script is misbehaving (i.e., showing 100% CPU)?
sudo apt install strace
strace -f -p $(pidof libgpiod_pulsein)
You can simply abort the strace
after a short amount of time with Ctrl+C. Normally, the output should look something like this:
strace: Process 731 attached with 2 threads
[pid 732] futex(0x24084, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 731] msgrcv(0,
I would assume that in your case you get lots of lines printed in a short amount of time. Can you please post your output here?
Hi. I got this: [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 4402] msgrcv(0, 0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) ^C0xbeef8bac, 4095, 1, 0) = -1 EINVAL (Invalid argument) strace: Process 4402 detached strace: Process 4403 detached
@Kraka23 Thanks, that helps a lot!
Your welcome!. You are helping us a lot too!
I still cannot provoke this behavior, so I need some help again. Could you please run the following commands while libgpiod_pulsein
is stuck at 100% CPU?
cat /proc/$(pidof libgpiod_pulsein)/cmdline
strace -f -p $(pidof libgpiod_pulsein) # exactly like in your last post but it is important to re-run it here
ipcs -q
lsipc
A couple of technical details on where we are right now:
msgrcv
returns an error and therefore becomes non-blockingEINVAL
, which according to msgop(2)
can happen in one of these cases:
msqid
was invalidmsgsz
was less than 0msgflg
specified MSG_COPY
, but not IPC_NOWAIT
msgflg
specified both MSG_COPY
and MSG_EXCEPT
msgflg
=0 and msgsz
=4095>=0, this basically only leaves the option that msqid
for some reason becomes invalid at some point in time.msqid
is and whether libgpiod_pulsein
tries to access the queue using the correct msqid
.No problem. Here we go:
_cat /proc/$(pidof libgpiod_pulsein)/cmdline_ ==> /home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/libgpiod_pulsein--pulses81--queue323-igpioc
_strace -f -p $(pidof libgpiodpulsein) ==> [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 7505] msgrcv(0, 0xbeb44bbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) ^Cstrace: Process 7505 detached strace: Process 7506 detached
ipcs -q ==> ------ Message Queues -------- key msqid owner perms used-bytes messages
lsipc ==> RESOURCE DESCRIPTION LIMIT USED USE% MSGMNI Number of message queues 32000 0 0.00% MSGMAX Max size of message (bytes) 8192 - - MSGMNB Default max size of queue (bytes) 16384 - - SHMMNI Shared memory segments 4096 0 0.00% SHMALL Shared memory pages 4278190079 0 0.00% SHMMAX Max size of shared memory segment (bytes) 4278190079 - - SHMMIN Min size of shared memory segment (bytes) 1 - - SEMMNI Number of semaphore identifiers 32000 0 0.00% SEMMNS Total number of semaphores 1024000000 0 0.00% SEMMSL Max semaphores per semaphore set. 32000 - - SEMOPM Max number of operations per semop(2) 500 - - SEMVMX Semaphore max value 32767 - -
Hope that help.
ipcs -q ==> ------ Message Queues -------- key msqid owner perms used-bytes messages
So the message queue is definitely gone. The best thing we can do in this case in libgpiod_pulsein
is to terminate because there is no way to re-establish the communication to the python module. I will send a pull request for this later.
What's still a mystery to me is why the message queue disappears without libgpiod_pulsein
being terminated by the python module. Could you describe how exactly you query the sensor? I.e., is it one long-running python script where you initiliaze the sensor once and then query regularly or is it some different setup?
From observing the work of the DHT22 sensor over the last few years, I would allow myself to conclude that the DHT22 is a poor sensor and does not like too long work, then it becomes very unstable.
From observing the work of the DHT22 sensor over the last few years, I would allow myself to conclude that the DHT22 is a poor sensor and does not like too long work, then it becomes very unstable.
The problem with the DHT11 and DHT22 is that they use a very timing sensitive protocol to communicate. That works pretty well with microcontrollers but it requires quite some effort to get it working reliably on a CPU with varying clock frequencies and an operating systems like Linux. However, as it currently looks like this bug is a pure software issue and actually not related to the sensor itself.
Have you tried the strace
and ipcs
commands as well? For now I just assumed that you and @Kraka23 are facing the exact same bug.
[pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) [pid 14193] msgrcv(8, 0x7ef9bbbc, 4095, 1, 0) = -1 EINVAL (Invalid argument) ^C[pid 14193] msgrcv(8, strace: Process 14193 detached <detached ...> strace: Process 14194 detached
cat /proc/$(pidof libgpiod_pulsein)/cmdline /home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/libgpiod_pulsein--pulses81--queue11753-igpiochip017
ipcs -q
------ Message Queues -------- key msqid owner perms used-bytes messages
pi@rpims-z2:~ $ lsipc RESOURCE DESCRIPTION LIMIT USED USE% MSGMNI Number of message queues 32000 0 0.00% MSGMAX Max size of message (bytes) 8192 - - MSGMNB Default max size of queue (bytes) 16384 - - SHMMNI Shared memory segments 4096 2 0.05% SHMALL Shared memory pages 4278190079 89 0.00% SHMMAX Max size of shared memory segment (bytes) 4278190079 - - SHMMIN Min size of shared memory segment (bytes) 1 - - SEMMNI Number of semaphore identifiers 32000 0 0.00% SEMMNS Total number of semaphores 1024000000 0 0.00% SEMMSL Max semaphores per semaphore set. 32000 - - SEMOPM Max number of operations per semop(2) 500 - - SEMVMX Semaphore max value 32767 - -
ipcs -q ==> ------ Message Queues -------- key msqid owner perms used-bytes messages
So the message queue is definitely gone. The best thing we can do in this case in
libgpiod_pulsein
is to terminate because there is no way to re-establish the communication to the python module. I will send a pull request for this later.What's still a mystery to me is why the message queue disappears without
libgpiod_pulsein
being terminated by the python module. Could you describe how exactly you query the sensor? I.e., is it one long-running python script where you initiliaze the sensor once and then query regularly or is it some different setup?
Hello again. The script run at start with systemctl. Here it is:
`
import time import board import adafruit_dht import time import sys import datetime from influxdb import InfluxDBClient
host = "0.0.0.0" # Configure InfluxDB connection variables port = 8086 # default port user = "" # the user/password created for the pi, with write access password = "" dbname = "metricas" # the database we created earlier interval = 120 # Sample period in seconds
client = InfluxDBClient(host, port, user, password, dbname) # Create the InfluxDB client object
sensor = adafruit_dht.DHT22(board.D24) # Enter the sensor details
measurement = "dht22" location = "living"
while True: try:
# humidity, temperature = Adafruit_DHT.read_retry(sensor, sensor_gpio) # Read the sensor using the configured driver and gpio
temperature = sensor.temperature
humidity = sensor.humidity
#iso = time.ctime()
iso = datetime.datetime.utcnow()
# Print for debugging, uncomment the below line
#print("[%s] Temp: %s, Humidity: %s" % (iso, temperature, humidity))
# Create the JSON data structure
data = [
{
"measurement": measurement,
"tags": {
"location": location,
},
"time": iso,
"fields": {
"temperatura" : temperature,
"humedad": humidity
}
}
]
# Send the JSON data to InfluxDB
client.write_points(data)
# Wait until it's time to query again...
time.sleep(interval)
#except KeyboardInterrupt:
except:
pass
`
And this is my first script that was working ok until stuck without notice and then update/change the modules and start with this problem (100% CPU).
` import time import sys import datetime import Adafruit_DHT from influxdb import InfluxDBClient
host = "0.0.0.0" # Configure InfluxDB connection variables port = 8086 # default port user = "" # the user/password created for the pi, with write access password = "" dbname = "metricas" # the database we created earlier interval = 120 # Sample period in seconds
client = InfluxDBClient(host, port, user, password, dbname) # Create the InfluxDB client object
sensor = Adafruit_DHT.DHT22 # Enter the sensor details sensor_gpio = 24
measurement = "dht22" location = "living"
while True: try:
humidity, temperature = Adafruit_DHT.read_retry(sensor, sensor_gpio)
#iso = time.ctime()
iso = datetime.datetime.utcnow()
# Print for debugging, uncomment the below line
#print("[%s] Temp: %s, Humidity: %s" % (iso, temperature, humidity))
# Create the JSON data structure
data = [
{
"measurement": measurement,
"tags": {
"location": location,
},
"time": iso,
"fields": {
"temperatura" : temperature,
"humedad": humidity
}
}
]
# Send the JSON data to InfluxDB
client.write_points(data)
# Wait until it's time to query again...
time.sleep(interval)
#except KeyboardInterrupt:
except:
pass
`
Hope that help. I can´t fix the format. Sorry.
What's still a mystery to me is why the message queue disappears without
libgpiod_pulsein
being terminated by the python module. Could you describe how exactly you query the sensor? I.e., is it one long-running python script where you initiliaze the sensor once and then query regularly or is it some different setup?Hello again. The script run at start with systemctl. Here it is: [...]
This explains why we do not see any errors from the Python side:
except: pass
,Could you please do the following?
import traceback
on the top of your script and replace the except: pass
by:
except KeyboardInterrupt:
break
except:
traceback.print_exc()
/usr/bin/python3 -u /path/to/your/script.py
. The -u
will make sure that any output from Python is immediately written to the journal.sudo systemctl daemon-reload
and restart your service.sudo journalctl --unit=name_of_your_service
.You will very likely be confronted with exceptions from time to time ("Checksum did not validate" or "A full buffer was not returned."). That is normal. Of interest is what the output looks like when the script starts to misbehave again, i.e., the very first errors shown before only errors occur.
Hope that help. I can´t fix the format. Sorry.
No worries, it's easy enough to reconstruct. For the future: You can surround multi-line code with three instead of a single backtick, then the format is kept.
Hello!. I have your info.
-- Logs begin at Mon 2020-10-26 01:56:38 -03, end at Mon 2020-10-26 21:41:51 -03. --
Oct 26 21:03:05 rpi4 systemd[1]: Started Recolecta datos de sensor dht22.
Oct 26 21:07:07 rpi4 python3[6407]: Traceback (most recent call last):
Oct 26 21:07:07 rpi4 python3[6407]: File "/mnt/squid-usb/scripts/dht22.py", line 37, in <module>
Oct 26 21:07:07 rpi4 python3[6407]: temperature = sensor.temperature
Oct 26 21:07:07 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 253, in temperature
Oct 26 21:07:07 rpi4 python3[6407]: self.measure()
Oct 26 21:07:07 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 198, in measure
Oct 26 21:07:07 rpi4 python3[6407]: pulses = self._get_pulses_pulseio()
Oct 26 21:07:07 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 129, in _get_pulses_pulseio
Oct 26 21:07:07 rpi4 python3[6407]: self.pulse_in.clear()
Oct 26 21:07:07 rpi4 python3[6407]: File "/home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/PulseIn.py", line 140, in clear
Oct 26 21:07:07 rpi4 python3[6407]: self._mq.send("c", True, type=1)
Oct 26 21:07:07 rpi4 python3[6407]: OSError: [Errno 22] Invalid argument
Oct 26 21:09:07 rpi4 python3[6407]: Traceback (most recent call last):
Oct 26 21:09:07 rpi4 python3[6407]: File "/mnt/squid-usb/scripts/dht22.py", line 37, in <module>
Oct 26 21:09:07 rpi4 python3[6407]: temperature = sensor.temperature
Oct 26 21:09:07 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 253, in temperature
Oct 26 21:09:07 rpi4 python3[6407]: self.measure()
Oct 26 21:09:07 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 198, in measure
Oct 26 21:09:07 rpi4 python3[6407]: pulses = self._get_pulses_pulseio()
Oct 26 21:09:07 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 129, in _get_pulses_pulseio
Oct 26 21:09:07 rpi4 python3[6407]: self.pulse_in.clear()
Oct 26 21:09:07 rpi4 python3[6407]: File "/home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/PulseIn.py", line 140, in clear
Oct 26 21:09:07 rpi4 python3[6407]: self._mq.send("c", True, type=1)
Oct 26 21:09:07 rpi4 python3[6407]: OSError: [Errno 22] Invalid argument
Oct 26 21:11:10 rpi4 python3[6407]: Traceback (most recent call last):
Oct 26 21:11:10 rpi4 python3[6407]: File "/mnt/squid-usb/scripts/dht22.py", line 37, in <module>
Oct 26 21:11:10 rpi4 python3[6407]: temperature = sensor.temperature
Oct 26 21:11:10 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 253, in temperature
Oct 26 21:11:10 rpi4 python3[6407]: self.measure()
Oct 26 21:11:10 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 198, in measure
Oct 26 21:11:10 rpi4 python3[6407]: pulses = self._get_pulses_pulseio()
Oct 26 21:11:10 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 129, in _get_pulses_pulseio
Oct 26 21:11:10 rpi4 python3[6407]: self.pulse_in.clear()
Oct 26 21:11:10 rpi4 python3[6407]: File "/home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/PulseIn.py", line 140, in clear
Oct 26 21:11:10 rpi4 python3[6407]: self._mq.send("c", True, type=1)
Oct 26 21:11:10 rpi4 python3[6407]: OSError: [Errno 22] Invalid argument
Oct 26 21:13:10 rpi4 python3[6407]: Traceback (most recent call last):
Oct 26 21:13:10 rpi4 python3[6407]: File "/mnt/squid-usb/scripts/dht22.py", line 37, in <module>
Oct 26 21:13:10 rpi4 python3[6407]: temperature = sensor.temperature
Oct 26 21:13:10 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 253, in temperature
Oct 26 21:13:10 rpi4 python3[6407]: self.measure()
Oct 26 21:13:10 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 198, in measure
Oct 26 21:13:10 rpi4 python3[6407]: pulses = self._get_pulses_pulseio()
Oct 26 21:13:10 rpi4 python3[6407]: File "/usr/local/lib/python3.7/dist-packages/adafruit_dht.py", line 129, in _get_pulses_pulseio
Oct 26 21:13:10 rpi4 python3[6407]: self.pulse_in.clear()
Oct 26 21:13:10 rpi4 python3[6407]: File "/home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/PulseIn.py", line 140, in clear
Oct 26 21:13:10 rpi4 python3[6407]: self._mq.send("c", True, type=1)
Oct 26 21:13:10 rpi4 python3[6407]: OSError: [Errno 22] Invalid argument
I hope that help!. Thanks for the format tips and thank you again for your effort.
Thank you for sticking with me here and providing all these information. I think I now know what is going on:
OSError: [Errno 22] Invalid argument
. However, due to the except:
clause, all exceptions are ignored and the script continues to run.libgpiod_pulsein
continues to run, consuming 100% CPU.What I will do:
libgpiod_pulsein
in this case terminates instead of hogging the CPU. But this cannot fix the underlying issue.What you can do:
adduser --system
) and use that for your daemon. logind won't cleanup message queues that belong to system users. Also you typically don't login and logout of them./etc/systemd/logind.conf
, set RemoveIPC=no
and remove the #
in the corresponding line.RuntimeError
(thrown at every failed reading) but no other exceptions. This way you immediately notice when things are going wrong.@Kraka23 and @darton : Can you confirm that the described setup matches your case and that either using a different user or modifying the logind.conf
fixes the issue for you?
Great @michaellass . Thank you so much. I apply the first option and create a system user to run the script. So far, so good!.
Exception in thread <function get_dht_data at 0x766998a0>:
Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "./rpims.py", line 221, in get_dht_data temperature = dhtDevice.temperature File "/home/pi/.local/lib/python3.7/site-packages/adafruit_dht.py", line 253, in temperature self.measure() File "/home/pi/.local/lib/python3.7/site-packages/adafruit_dht.py", line 198, in measure pulses = self._get_pulses_pulseio() File "/home/pi/.local/lib/python3.7/site-packages/adafruit_dht.py", line 137, in _get_pulses_pulseio pulses.append(self.pulse_in.popleft()) OverflowError: unsigned short is greater than maximum
Thanks all for clarification this issue, especially @michaellass I try to use sensorReporter project and got this 100% CPU issue also. And yes, I was lazy and run systemctl service also as user pi %) maybe you can add some diagnostic messages about (not recommended) running as user that have interactive shell as user shell configured, for example
There is actually a pull request open to print an error message and quit libgpiod_pulsein
in this case instead of just hogging the CPU: https://github.com/adafruit/libgpiod_pulsein/pull/2
Hopefully people will find this report and the solution quickly when searching for that error message.
What you can do:
Either one of the following to fix the issue:
[...]
- Modify
/etc/systemd/logind.conf
, setRemoveIPC=no
and remove the#
in the corresponding line.
I run into the same problem but the above workaround doesn't work for me.
I let my script start automatically on system boot via cron by creating a tmux session which runs this script. This works fine, however after some time (and I guess after some logins with the same user that runs the script) this known problem appears. libgpiod_pulseio uses 100% CPU and the following exception occurs when calling dht_device.temperature
:
File "/home/pi/.local/lib/python3.7/site-packages/adafruit_blinka/microcontroller/bcm283x/pulseio/PulseIn.py", line 143, in clear
self._mq.send("c", True, type=1)
OSError: [Errno 22] Invalid argument
Unfortunately it looks like the next read of dht_device.temperature
returns a cached value. So when catching and ignoring the above exception everything looks fine and the device still returns temperature and humidity values, however they just come from the cache.
In my view the more clean solution is the first one, i.e.
Create a system user for your daemon (adduser --system) and use that for your daemon. logind won't cleanup message queues that belong to system users. Also you typically don't login and logout of them.
However, the workaround of setting RemoveIPC=no
in logind.conf
should work as well. After that change you probably need to either reboot the system or issue systemctl restart systemd-logind
to restart logind.
That dht_device.temperature
gives you a cached value is not unexpected. In general, you should not ignore Exceptions that you do not really expect. In this case I suggest ignoring RuntimeError
but nothing else.
In my view the more clean solution is the first one, i.e.
Create a system user for your daemon (adduser --system) and use that for your daemon. logind won't cleanup message queues that belong to system users. Also you typically don't login and logout of them.
Yeah, I guess so.
However, the workaround of setting
RemoveIPC=no
inlogind.conf
should work as well. After that change you probably need to either reboot the system or issuesystemctl restart systemd-logind
to restart logind.
That's what I did (a reboot).
That
dht_device.temperature
gives you a cached value is not unexpected. In general, you should not ignore Exceptions that you do not really expect. In this case I suggest ignoringRuntimeError
but nothing else.
Agreed.
I'll close this issue. The phenomenon is caused by systemd/logind killing user's message queues when they log off. This is expected behavior. Making this obvious to users and avoiding the 100% CPU hog would have been addressed by https://github.com/adafruit/libgpiod_pulsein/pull/2 which never was approved.
"makermelissa merged commit a91d387 into adafruit:master"
Reported by @darton in https://github.com/adafruit/Adafruit_CircuitPython_DHT/pull/47#issuecomment-689438687 and https://github.com/adafruit/Adafruit_Blinka/issues/210#issuecomment-689427753: