Closed Kreeblah closed 5 years ago
@Kreeblah
Many thanks for the report 👍
here were a ton of getcwd() and file not found errors during the installation
Interesting, ok. Appears to be a failed installation at basic file level.
I'll try to replicate
Testing:
getcwd
errors confirmed.80e3b5f9-5236-4dbf-adf2-c04397740817
Beta:
root@DietPi:~# G_PROGRAM_NAME=test G_INIT
root@DietPi:/tmp/test#
shell-init: error retrieving current directory: getcwd:
Using a fresh image, that wasn't used for previous dev testing: 🈯️ Fresh VB image with 2.1GB RAM 🈴 failed on 2nd test
dietpi-software install 93
#fail
🈴 Fresh VB image with 1GB RAM
tmpfs /tmp tmpfs defaults,size=1023M,noatime,nodev,nosuid,mode=1777 0 0
root@DietPi:~# free -m
total used free shared buff/cache available
Mem: 996 23 915 5 57 871
Swap: 1051 0 1051
root@DietPi:~# ls -lha /tmp
total 4.0K
drwxrwxrwt 7 root root 140 Nov 12 17:53 .
drwxr-xr-x 22 root root 4.0K Sep 20 13:08 ..
drwxrwxrwt 2 root root 40 Nov 12 17:52 .font-unix
drwxrwxrwt 2 root root 40 Nov 12 17:52 .ICE-unix
drwxrwxrwt 2 root root 40 Nov 12 17:52 .Test-unix
drwxrwxrwt 2 root root 40 Nov 12 17:52 .X11-unix
drwxrwxrwt 2 root root 40 Nov 12 17:52 .XIM-unix
root@DietPi:~# umount /tmp
root@DietPi:~# ls -lha /tmp
total 40K
drwxrwxrwt 9 root root 4.0K Sep 20 13:04 .
drwxr-xr-x 22 root root 4.0K Sep 20 13:08 ..
drwxr-xr-x 2 root root 4.0K Aug 16 15:32 DietPi-Drive_Manager
drwxr-xr-x 2 root root 4.0K Aug 16 15:31 DietPi-PREP
drwxrwxrwt 2 root root 4.0K Aug 16 15:27 .font-unix
-rw-r--r-- 1 root root 579 Aug 16 15:32 G_ERROR_HANDLER_COMMAND
drwxrwxrwt 2 root root 4.0K Aug 16 15:27 .ICE-unix
drwxrwxrwt 2 root root 4.0K Aug 16 15:27 .Test-unix
drwxrwxrwt 2 root root 4.0K Aug 16 15:27 .X11-unix
drwxrwxrwt 2 root root 4.0K Aug 16 15:27 .XIM-unix
root@DietPi:~# G_DEBUG=1 dietpi-software install 93
[ OK ] DietPi-Software | Root access verified.
[ OK ] DietPi-Software | RootFS R/W access verified.
DietPi-Software
─────────────────────────────────────────────────────
Mode: Running G_INIT()
[ INFO ] DietPi-Software | Entered scripts working directory: /tmp/DietPi-Software
[ OK ] DietPi-Software | Initialized database
[ .... ] DietPi-Software | Reading database, please wait.../tmp/DietPi-Software
total 8.0K
drwxr-xr-x 2 root root 4.0K Nov 12 18:18 .
drwxrwxrwt 8 root root 4.0K Nov 12 18:18 ..
[ INFO ] DietPi-Software | Navigated to /tmp
[ INFO ] DietPi-Software | Removed scripts working directory: /tmp/DietPi-Software
[ OK ] DietPi-Software | Reading database completed
DietPi-Software
─────────────────────────────────────────────────────
Mode: Automated install
[ OK ] DietPi-Software | Installing Pi-hole: block adverts for any device on your network
[ OK ] DietPi-Software | Free space check: path=/ | available=6899 MB | required=500 MB
[ OK ] DietPi-Software | DietPi-Userdata validation: /mnt/dietpi_userdata
[ OK ] DietPi-Software | Connection test: https://deb.debian.org/debian/
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ SUB1 ] DietPi-Run_ntpd > Running G_INIT()
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ INFO ] DietPi-Run_ntpd | Entered scripts working directory: /tmp/DietPi-Run_ntpd
[ OK ] NTPD: time sync | Completed
/tmp/DietPi-Run_ntpd
total 8.0K
drwxr-xr-x 2 root root 4.0K Nov 12 18:18 .
drwxrwxrwt 8 root root 4.0K Nov 12 18:18 ..
[ INFO ] DietPi-Run_ntpd | Navigated to /tmp
[ INFO ] DietPi-Run_ntpd | Removed scripts working directory: /tmp/DietPi-Run_ntpd
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ SUB1 ] DietPi-Services > Running G_INIT()
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ INFO ] DietPi-Services | Entered scripts working directory: /tmp/DietPi-Services
[ SUB1 ] DietPi-Services > unmask
[ OK ] DietPi-Services | unmask all: cron
/tmp/DietPi-Services
total 8.0K
drwxr-xr-x 2 root root 4.0K Nov 12 18:18 .
drwxrwxrwt 8 root root 4.0K Nov 12 18:18 ..
[ INFO ] DietPi-Services | Navigated to /tmp
[ INFO ] DietPi-Services | Removed scripts working directory: /tmp/DietPi-Services
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ SUB1 ] DietPi-Services > Running G_INIT()
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ INFO ] DietPi-Services | Entered scripts working directory: /tmp/DietPi-Services
[ SUB1 ] DietPi-Services > stop
[ OK ] DietPi-Services | stop : cron
/tmp/DietPi-Services
total 8.0K
drwxr-xr-x 2 root root 4.0K Nov 12 18:18 .
drwxrwxrwt 8 root root 4.0K Nov 12 18:18 ..
[ INFO ] DietPi-Services | Navigated to /tmp
[ INFO ] DietPi-Services | Removed scripts working directory: /tmp/DietPi-Services
/DietPi/dietpi/dietpi-software: line 14354: cd: /tmp/DietPi-Software: No such file or directory
DietPi-Software
─────────────────────────────────────────────────────
Mode: Update & upgrade APT
^C
Bug with current VB image. Will cause update to also fail. 🈯️
[ .... ] DietPi-Software | Reading database, please wait.../tmp/DietPi-Software
total 8.0K
drwxr-xr-x 2 root root 4.0K Nov 12 18:18 .
drwxrwxrwt 8 root root 4.0K Nov 12 18:18 ..
[ INFO ] DietPi-Software | Navigated to /tmp
[ INFO ] DietPi-Software | Removed scripts working directory: /tmp/DietPi-Software
####
G_INIT_ALLOW_CONCURRENT=1 dietpi-software install 93
Image needs redoing.
G_DEBUG=1 dietpi-software install 93
, yet, /tmp is clear all times.Will also check VMware image:
[ INFO ] DietPi-Software | Entered scripts working directory: /tmp/DietPi-Software
[ OK ] DietPi-Software | Initialized database
[ .... ] DietPi-Software | Reading database, please wait... <<<<< the cause of G_EXIT trigger
G_EXIT for DietPi-Software
PWD=/tmp/DietPi-Software
[ OK ] DietPi-Software | Reading database completed
[ INFO ] DietPi-Software | Navigated to /tmp
Triggers G_EXIT:
G_DIETPI-NOTIFY -2 'Reading database'
G_DIETPI-NOTIFY 0 'Reading database' # << triggers the G_EXIT
🈯️ Setting sync
at start of G_DIETPI-NOTIFY
resolves issue...
root@DietPi:~# G_DEBUG=1 dietpi-software install 93 | grep EXIT
/tmp/dietpi-process.pid PID KILL (EXIT)
/tmp/dietpi-process.pid PID KILL (EXIT)
DietPi-Software Running G_EXIT()
/tmp/dietpi-process.pid PID KILL (EXIT)
DietPi-Software Running G_EXIT()
🈯️
sync
before checking for /tmp/dietpi-process.pid
root@DietPi:~# G_DEBUG=1 dietpi-software install 93 | grep EXIT
/tmp/dietpi-process.pid PID KILL (EXIT)
[. ] /tmp/dietpi-process.pid PID KILL (EXIT)
DietPi-Software Running G_EXIT()
🈯️
sync
set -o noclobber
if { > /tmp/dietpi-process.pid; } &> /dev/null; then
set +o noclobber
{ Print_Process_Animation & echo $! > /tmp/dietpi-process.pid; disown; } 2> /dev/null
echo -e "$G_PROGRAM_NAME | $! > /tmp/dietpi-process.pid (EXIT)"
sync
Left = before fix Right = after fix
@MichaIng
Reason why this only occurs on VM? Unsure, cache/disk/delay issue with VM's under noclobber
modes?
Another solution is to precreate the blank file, beforehand, then wait for value to be valid:
set -o noclobber
> /tmp/dietpi-process.pid
if { > /tmp/dietpi-process.pid; } &> /dev/null; then
set +o noclobber
{ Print_Process_Animation & echo $! > /tmp/dietpi-process.pid; disown; } 2> /dev/null
#sync
else
rm /tmp/dietpi-process.pid
set +o noclobber
fi
Clean_Process_Animation(){
while [[ -f /tmp/dietpi-process.pid ]]
do
local pid=$(</tmp/dietpi-process.pid)
if [[ -t 0 ]] && [[ $pid ]]; then
kill $pid &> /dev/null
rm /tmp/dietpi-process.pid &> /dev/null
# In case, the output took more than one line, clean from cursor (animation position) until end of terminal.
tput ed
break
fi
echo -e "G_EXIT sleeping, file but no value"
sleep 1
done
output_string+='\r\e[K'
}
Needs more testing. sync
may resolved, but so does sleep 0.1
. sync
may simply create enough time for bash/PID's/system to catch up.
@Fourdee Strange, I don't understand yet the reason.
G_INIT_ALLOW_CONCURRENT=1 dietpi-software install 93
This was just a test, right? Very dangerous the option, should be never used in production or within release code. If ever, then better implement an option that skips /tmp/$G_PROGRAM_NAME
creation as well: G_INIT_NO_TMPDIR
or something like this, if the script does not need to create any tmp files?
Could be also realized via argument e.g. G_INIT 0
(no /tmp/dir)
But this breaks concurrency check, which for my impression should be always done. Unpredictable issues might occur, depending on script, when run two times concurrently...
root@DietPi:\~# umount /tmp root@DietPi:~# ls -lha /tmp
We should clean /tmp before mounting tmpfs there during PREP or whenever done within our scripts 😉.
And just to be sure, this issue does not only occur on first run install then, right?
About file system sync:
async
even occur when writing to RAM? Does not make any sense to me since the feature's intention is to reduce disk I/O, to not write a full 4k block to disk for every small file. No reason to keep a write in cache (RAM) when it is to be written to RAM anyway ~and no min block size present of course...~ (€: tmpfs HAS block size of 4k! Just tested, so async might even have a certain value there) But could not find an explicit statement about async on tmpfs during web search.But now, if somehow it's true and /tmp/dietpi-process.pid
is not created in file system but kept in cache:
[[ -w /tmp/dietpi-process.pid ]] && echo -ne "\r$bracket_l${aprocess_string[i]}$bracket_r " || return
disown
is called?{ Print_Process_Animation & echo $! > /tmp/dietpi-process.pid; disown; } 2> /dev/null
echo
call takes, disown might no yet be done, now somehow EXIT trap call occurs on parent shell, even that it is not exited by that?The problem with using a PID variable is that it is only available in current shell (and afterwards opened sub shells, if exported), but no chance to have it available (or change it) in parent script/shell. So the only chance to kill the process animation would be to actively kill it from the very same shell/script.
G_DIETPI-NOTIFY -2
is called. PS1 prompt can as well check/unset variable and kill animation, in case a G_* function from login shell called the animation.
So before changing the whole system, I will do some tests first about EXIT traps and background/sub processes.
disown
before echo solves it: { Print_Process_Animation & disown; echo $! > /tmp/dietpi-process.pid; } 2> /dev/null
Also, if really async
is done on tmpfs
as well, we should properly mount it with sync
option to override? Although (see above) I just found it has 4k block size. Hmm the defaults should have some reason, so better not mess with this.
@MichaIng
This was just a test, right?
Yep, all above was debug testing, trying to find cause of the unexpected G_EXIT
call.
We should clean /tmp before mounting tmpfs there during PREP or whenever done within our scripts
yep 👍
And just to be sure, this issue does not only occur on first run install then, right?
Occurs after 1st run installation, and during boot (briefly on VM):
Add program_name to G_INIT
~Following disables errors, checking that G_PROGRAM_NAME exists.~ Worked once...
if [[ $G_PROGRAM_NAME && -d /tmp/$G_PROGRAM_NAME ]]; then
@MichaIng
🈯️ Fixed the boot issue, use the binary instead of variable $PWD
to get current directory:
if [[ $(pwd) == /tmp/$G_PROGRAM_NAME* ]]; then
So either:
$PWD
is outdated when we call it$PWD
becomes corrupt when the var is updated, when we call it at that timepwd
calls a syncpwd
allows enough time for the current directory call to be valid.Regarding traps:
http://redsymbol.net/articles/bash-exit-traps/ If some error causes the script to exit prematurely, ~the scratch directory and its contents don't get deleted.~ This is a resource leak?
Hmm, unsure, but dietpi-software
issues. Disabling this resolves.
G_DIETPI-NOTIFY -2 'Reading database'
@MichaIng
🈯️ Think i've fixed it:
G_DEBUG=1 dietpi-software install 93
as before, no errors!
kill -9 $(</tmp/dietpi-process.pid) &> /dev/null
Would indicate the process is not terminated fast enough using SIGTERM
SIGTERM: But all this does is trigger the while loop and sleep everytime, allowing time for it to complete.
while kill -15 $(</tmp/dietpi-process.pid) &> /dev/null
do
echo -e "waiting for process to terminate"
sleep 0.1
done
This is definitely the cause of the dietpi-software issues. Disabling this resolves.
I don't get the connection yet. The AMI instance in the guides example is our animation process? If the dietpi-software
exit trap does not terminate it, it will run forever. Jep makes sense, for this reason currently the PS1 prompt command terminates the animation as last resort. However it makes sense to do this within the exit trap as well. PS1 prompt command will then only do this, if some G_* command from terminal calls animation and is cancelled or fails.
About $PWD
:
cd
into it and G_EXIT is called very fast afterwards, leading to $PWD
being not yet updated accordingly, it will lead to cd
back into /tmp
is not done and working dir removal leads to getcwd error. Makes sense.cd
should not even have been possible. So async
then is not the problem, just the outdated $PWD
variable? Then $(pwd)
indeed is a good solution to assure correct current dir is checked.Perhaps it is easiest/fastest to simply always cd /tmp
before removing working dir and skip the test completely?
But now about the actual debug log:
DietPi-Software
─────────────────────────────────────────────────────
Mode: Running G_INIT()
[ INFO ] DietPi-Software | Entered scripts working directory: /tmp/DietPi-Software
[ OK ] DietPi-Software | Initialized database
[ .... ] DietPi-Software | Reading database, please wait.../tmp/DietPi-Software
total 8.0K
drwxr-xr-x 2 root root 4.0K Nov 12 18:18 .
drwxrwxrwt 8 root root 4.0K Nov 12 18:18 ..
[ INFO ] DietPi-Software | Navigated to /tmp
[ INFO ] DietPi-Software | Removed scripts working directory: /tmp/DietPi-Software
[ OK ] DietPi-Software | Reading database completed
[[ -d /tmp/DietPi-Software ]]
returned true, since: [ INFO ] DietPi-Software | Removed scripts working directory: /tmp/DietPi-Software
[[ $PWD == /tmp/$G_PROGRAM_NAME* ]]
returned true as well, so $PWD
obviously was correct, since: [ INFO ] DietPi-Software | Navigated to /tmp
Thinking now about this, I believe it is due to:
/DietPi/dietpi/func/dietpi-set_dphys-swapfile
remounting /tmp
, which leads to content and cwd loss.
This is called during preboot
1st run setup, after which the error appeared and during Pi-hole install, after which error was reported. 🈯️ Makes totally sense!
We need to:
/tmp
contentcd
out of /tmp
umount /tmp
/tmp
contentmount /tmp
, which will automatically mount according to fstab entry, AFAIR (as far as I read 😉) but need testcd /tmp/$G_PROGRAM_NAME
, which should be the swapfile scripts working dir.[x] Verify that parent script pwd is still correct, even that /tmp was remounted in between.
But how does pwd
now fix the issue? Perhaps it also corrects the current dir, if non-existent, re-creates it or sets it to parent dir or $HOME or such?
Last question is still why the dietpi-software
EXIT trap was actually called during database reading above. /tmp remount and wrong $PWD cannot be the issue since on 1st run setup, dietpi-software is definitely called long after preboot resets /tmp mount and database reading is done before Pi-hole install code remounts it.
Since the script goes on, no EXIT call was done, obviously.
Puhh looks to me that all of these issues are not related to each other. ~The real issue, which is the cause for the TO and your replicated 1st boot/install error is definitely swapfile creation remounting /tmp, leading to content being cleared.~
@MichaIng
Thinking now about this, I believe it is due to: /DietPi/dietpi/func/dietpi-set_dphys-swapfile remounting /tmp, which leads to content and cwd loss. This is called during preboot 1st run setup, after which the error appeared and during Pi-hole install, after which error was reported. 🈯️ Makes totally sense!
The boot issue occurs at all times https://github.com/Fourdee/DietPi/issues/2237#issuecomment-438303603, even after 1st run setup. At that point, /tmp
is mounted before all dietpi scripts.
73ms dietpi-ramlog.service
72ms dropbear.service
63ms systemd-tmpfiles-setup-dev.service
60ms systemd-remount-fs.service
49ms systemd-timesyncd.service
47ms systemd-sysctl.service
45ms systemd-journal-flush.service
40ms systemd-tmpfiles-setup.service
38ms systemd-modules-load.service
38ms dev-hugepages.mount
37ms console-setup.service
37ms systemd-update-utmp.service
36ms dev-mqueue.mount
35ms tmp.mount
35ms sys-kernel-debug.mount
32ms systemd-tmpfiles-clean.service
28ms systemd-random-seed.service
25ms ssh.service
20ms var-log.mount
20ms kmod-static-nodes.service
12ms DietPi.mount
12ms systemd-update-utmp-runlevel.service
11ms systemd-user-sessions.service
@Fourdee
Jep, also found swap file creation, including mount -o remount,size=753M tmpfs /tmp
did not clear /tmp dir. Current $PWD and even contained files are reserved. Seems the content backup is somehow done automatically 👍. So no issue with this!
$PWD
and pwd
both do not check/recognize that the directory does not exist:
root@VM-Stretch:/tmp/testdir# rm -R /tmp/testdir
root@VM-Stretch:/tmp/testdir# echo $PWD
/tmp/testdir
root@VM-Stretch:/tmp/testdir# pwd
/tmp/testdir
root@VM-Stretch:/tmp/testdir# l
total 0
root@VM-Stretch:/tmp/testdir# cd ..
root@VM-Stretch:/tmp# l
total 0
drwxrwxrwt 2 root root 40 Nov 13 18:35 .font-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .ICE-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .Test-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .X11-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .XIM-unix
cd /tmp
without any cwd test is best then.[[ -d /tmp/$G_PROGRAM_NAME ]]
, otherwise in case of removed dir, it will not navigate out of invalid dir:
root@VM-Stretch:/tmp# mkdir testdir
root@VM-Stretch:/tmp# cd testdir
root@VM-Stretch:/tmp/testdir# rm -R /tmp/testdir
root@VM-Stretch:/tmp/testdir# [[ -d /tmp/testdir ]] || echo removed
removed
root@VM-Stretch:/tmp/testdir# cd ..
root@VM-Stretch:/tmp# l
total 0
drwxrwxrwt 2 root root 40 Nov 13 18:35 .font-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .ICE-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .Test-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .X11-unix
drwxrwxrwt 2 root root 40 Nov 13 18:35 .XIM-unix
cd /tmp
at all.
Tested exit traps up and down with/without disown, immediate and after a while terminating, from inside and outside the background job, within/outside { ... } &> ... Couldn't create any situation in which exiting the background job calls parent script exit trap 🤔.
@MichaIng
Tested exit traps up and down with/without disown, immediate and after a while terminating, from inside and outside the background job, within/outside { ... } &> ...
[ INFO ] DietPi-Software | Entered scripts working directory: /tmp/DietPi-Software
[. ] /DietPi/dietpi/func/dietpi-globals: line 270: 2455 Terminated Print_Process_Animation
[ OK ] DietPi-Software | Initialized database
[ INFO ] DietPi-Software | Navigated to /tmp
[ OK ] DietPi-Software | Reading database completed
[ INFO ] DietPi-Software | Removed scripts working directory: /tmp/DietPi-Software
/DietPi/dietpi/dietpi-software: line 1: 2461 Terminated Print_Process_Animation
@Fourdee
Would indicate the process is not terminated fast enough using SIGTERM
But why this is an issue? Even if the kill command does it's job in the background, while script goes on, this would at worst lead to a parallel animation, until kill has finished. But this should not lead to any folder deletion or affect cwd?
Quick validation about tmpfs remount: http://man7.org/linux/man-pages/man5/tmpfs.5.html
* During a remount operation (mount -o remount), the filesystem size can be changed (without losing the existing contents of the filesystem).
👍
@MichaIng
But why this is an issue? Even if the kill command does it's job in the background, while script goes on, this would at worst lead to a parallel animation, until kill has finished. But this should not lead to any folder deletion or affect cwd?
Unsure at moment. Maybe SIGTERM is allowing a process/memory leak to occur in Print_Process_Animation
when terminated.
Example is:
G_DIETPI-NOTIFY -2 'Reading database'
is fine, but the next call triggers the G_EXIT:G_DIETPI-NOTIFY 0 'Reading database'
Quick validation about tmpfs remount:
Nice 👍
Intresting, this works:
tput ed
on exit
[[ -w /tmp/dietpi-process.pid ]] && echo -ne "\r$bracket_l${aprocess_string[i]}$bracket_r " || tput ed && return
Then simply rm /tmp/dietpi-process.pid
in process clean.
So issue is leakage when terminating Print_Process_Animation
using SIGTERM?
Breaks animation lol
putting tput ed
back after Clean_Process_Animation
resolved.
Simply not killing the process is the fix I believe. Preventing leakage and allowing graceful exit.
And this should be exit
as we don't want to return any value/info?
[[ -w /tmp/dietpi-process.pid ]] && echo -ne "\r$bracket_l${aprocess_string[i]}$bracket_r " || exit 0
$PWD issue still occurs with the above.
Removing the code and simply using the following also causes the error, so cd /tmp
is what is failing here?
if cd /tmp; then
[[ $G_DEBUG == 1 ]] && G_DIETPI-NOTIFY 2 'Navigated to /tmp'
else
[[ $G_DEBUG == 1 ]] && G_DIETPI-NOTIFY 2 "Failed to navigate out of /tmp/$G_PROGRAM_NAME"
fi
if (( ! $G_INIT_ALLOW_CONCURRENT )); then
if rm -R /tmp/$G_PROGRAM_NAME; then
[[ $G_DEBUG == 1 ]] && G_DIETPI-NOTIFY 2 "Removed scripts working directory: /tmp/$G_PROGRAM_NAME"
else
[[ $G_DEBUG == 1 ]] && G_DIETPI-NOTIFY 2 "Failed to removed scripts working directory: /tmp/$G_PROGRAM_NAME"
fi
fi
@MichaIng
If you need VNC access for the boot issue, let me know i'll set it up?
@Fourdee
But the above is already live code? Only tput ed
added, but what is the influence of this?
Ah, not required in Clean_Process_Animation
then. But what I don't like about it:
tput
fails for another reason, animation does not stop. Okay but simple solution is || { tput ed; return; }
tpud ed
would possibly remove this other output and/or we would like to still see the last processing message, if it was terminated unexpectedly? I like to have this at defined positions, when G_DIETPI-NOTIFY wants to overprint, or script exit and PS1 prompt want to terminate and clean output.I still not get it 🤔. Made more research and more testing and still unable to find any way to make a child termination call the parent exit trap... And even if something goes wrong with the PID, e.g. somehow parent PID is saved to PID file, then the parent script would exit as well, which it does not...
So even that we have some solutions with cd /tmp
and $(pwd)
that seem to solve the issue, I would like to understand how dietpi-software exit trap can be called, when child background process is terminated and without dietpi-software actually exiting. So it receives an EXIT signal but goes on working?? I have headache now 🤣.
€: Jep, since I can't replicate the issue at all on my Stretch VM (reset to use $PWD, re-enable fs resize service, echo -1 > .install_stage, reboot), VNC access would be good, so I can play around myself. Which client do you use/recommend for Windows system? First find was RealVNC viewer. The Windows internal remote desktop client does not work, right?
@MichaIng
TightVNC works well, only install the viewer. Setting it up now.
@MichaIng
82.7.94.230 same pw as webserver.
Okay long testing session:
/DietPi/dietpi/preboot
is actually fine. It looks like it calls exit trap before it's done, but it's not. All remaining tasks are send to background and exit traps obviously do not wait for background children to finish, which is good to know. Took a long time of testing and investigating until realized, 🤣.cd /tmp
will be done during exit trap without any $PWD
/$(pwd)
or [[ -d /tmp/$G_PROGRAM_NAME ]] checks. Prevents issues due to possible async
and/or $PWD
update delay and if /tmp/$G_PROGRAM_NAME
was removed for another reason.dietpi-software install 93
still needs investigation. However process animation will be killed via kill -9
(SIGKILL vs SIGTERM), just in case and it's faster.@MichaIng
Thanks Micha, indeed a very long debugging session 😄 Thanks for your help 👍
Some notes my end:
dietpi-software install 93 still needs investigation. However process animation will be killed via kill -9 (SIGKILL vs SIGTERM), just in case and it's faster.
Removal of kill command and allowing bg process to terminate on its own when .pid
file is removed, also worked.
Any Pipe with preboot script and multiple + threads in the script, was triggering the getcwd errors during boot.
Removal of kill command and allowing bg process to terminate on its own when .pid file is removed, also worked.
The only issue with this is the max 0.15 seconds delay between removal and termination:
This only works, if the animation process checks content of PID file, verifying that it's still his own PID [[ $(</tmp/dietpi-process.pid) == $BASHPID ]]
. But decreases performance 🤔.
For my impression active process termination before removing PID file (thus allowing new animations) is the better deal then.
Any Pipe with preboot script and multiple + threads in the script, was triggering the getcwd errors during boot.
Still not sure, where exactly the getcwd error came from, but they did not appear after forced cd /tmp
any more, right? At least the multiple bg processes were not related.
I made a start: https://github.com/Fourdee/DietPi/pull/2248
@MichaIng
Still not sure, where exactly the getcwd error came from, but they did not appear after forced cd /tmp any more, right? At least the multiple bg processes were not related.
Still occurring on my tests.
Believe you were right with a /tmp
mount issue as following does not create log:
ExecStart=/bin/bash -c '/DietPi/dietpi/preboot &>> /tmp/dietpi-preboot.log'
🈯️ However, this does:
ExecStart=/bin/bash -c '/DietPi/dietpi/preboot &>> /root/dietpi-preboot.log'
~Interesting:~
/tmp
briefly just after login, but few seconds later /tmp
is reset with:~
root@DietPi:~# ls -lha /tmp
total 4.0K
drwxrwxrwt 7 root root 140 Nov 15 09:17 .
drwxr-xr-x 22 root root 4.0K Nov 12 20:33 ..
drwxrwxrwt 2 root root 40 Nov 15 09:15 .font-unix
drwxrwxrwt 2 root root 40 Nov 15 09:15 .ICE-unix
drwxrwxrwt 2 root root 40 Nov 15 09:15 .Test-unix
drwxrwxrwt 2 root root 40 Nov 15 09:15 .X11-unix
drwxrwxrwt 2 root root 40 Nov 15 09:15 .XIM-unix
~Something is playing with /tmp
~
dietpi-logclear
, still occursmv /tmp/dietpi-*.log /var/tmp/dietpi/logs/ &
in postboot&>> /tmp/dietpi-preboot.log
root@DietPi:~# cat /var/tmp/dietpi/logs/dietpi-preboot.log
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ OK ] Root access verified.
[ OK ] Root access verified.
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
Notice: CPU Governors are not available for VM.
[ SUB1 ] DietPi-LED_Control > Applying LED triggers
[ OK ] DietPi-LED_Control | input0::capslock: kbd-capslock
[ OK ] DietPi-LED_Control | input0::numlock: kbd-numlock
[ OK ] DietPi-LED_Control | input0::scrolllock: kbd-scrolllock
Disable StandardOutput=tty
and all threads in preboot
, blob?
@MichaIng
Ok, still unsure of cause. Although:
preboot
resolves issue.So for now, I believe we should roll out a workaround fix to disable threading, until more time is available to debug this further?
Also tried:
G_THREAD_START
, instead of &
, same errors in output.
Redo images:
Using dev
branch, switch dietpi.txt
and .version
afterwards.
G_CONFIG_INJECT 'DEV_GITBRANCH=' 'DEV_GITBRANCH=master' /boot/dietpi.txt
G_CONFIG_INJECT 'G_GITBRANCH=' 'G_GITBRANCH=master' /boot/dietpi/.version
G_CONFIG_INJECT 'G_DIETPI_VERSION_RC=' 'G_DIETPI_VERSION_RC=20' /boot/dietpi/.version
🈯️ VB Stretch 🈯️ Entry dupes https://github.com/Fourdee/DietPi/commit/285965190562a77f8ccf6f03b5208b3af8edd979
for i in /etc/bashrc.d/*.sh; do [ -r "$i" ] for i in /etc/bashrc.d/*.sh; do [ -r "$i" ] && . $i; donefor i in /etc/bashrc.d/*.sh; do [ -r "$i" ] &
VB Buster
🈯️ VMware Stretch
VMware Buster
~Stretch images done, uploaded to testing folder, however, updates to master branch (with the current issues) during 1st run due to RC lower version.~ Redone.
@Fourdee
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
[ OK ] Root access verified.
[ OK ] Root access verified.
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd /tmp
seemed to be done successfully (already prior to change), but the background jobs then seem to fail getting current directory./tmp/DietPi-PreBoot
, as this was cwd when being initiated. But immediately parent script changes to /tmp
and removes cwd of background jobs.Does it solve to actively do cd /tmp
before initiating background jobs?
@MichaIng
Does it solve to actively do cd /tmp before initiating background jobs?
Nope, tried cd /tmp; command &
, still occurs
Best would be to let the trap wait for those to finish: https://stackoverflow.com/a/356154
wait
.exit
call at end of preboot
? Should be not required, also we do not pass any exit code.So test schedule:
exit
command from end of preboot
wait
at end of script (before exit
).cd /tmp
before initiating background jobs, although this means we need to keep that in mind, in case add tmp file creation later.Lowering priority level as both Stretch VM images have been updated with the fix for getcwd issues:
@Kreeblah
Please re-download required VM image in above link. Resolves the issue you experienced.
Okay went on with testing:
set_cpu
and set_led
being bg jobs again and added G_DEBUG=1
to top of preboot.getcwd
errors, but not consistently. One time they showed up for one of set_cpu/led only, one time it showed up for both.shell-init
(init of bg job) and on chdir
(when G_INIT want's to cd
into subscript working dir).As mentioned above, after the bg jobs are initiated, preboot does not wait for them to finish with the EXIT trap: cd /tmp
and rm -R /tmp/DietPi-PreBoot
. Most properly the bg job init (+ cd /tmp/$G_PROGRAM_NAME
) and the EXIT traps steps are too close together, breaking each others sometimes.
🈺 Removing exit
from the end of the script, does not make the EXIT trap wait for bg jobs. Also the error shows still up by times.
🈯️ The following works:
aPID=()
...
/DietPi/dietpi/func/dietpi-set_cpu & aPID+=( $! )
/DietPi/dietpi/func/dietpi-led_control & aPID+=( $! )
...
for i in ${aPID[@]}
do
wait $i
done
EXIT trap is then delayed until all bg jobs finished. Did ~10 reboots, no getcwd error shown any more.
The question is if the benefit of using bg jobs is really big enough compared to all this PID handling effort 🤔.
Another idea to reduce boot time is to scan for actual existent config files before calling set_cpu/led functions (and by this skipping all globals/INIT tasks). It should be at best possible then as well to reset settings to system defaults and removing the config files then. E.g. on headless systems, you don't care about the LEDs and on VMs, no CPU handling available anyway.
@MichaIng
Excellent debugging + fix 👍
Maybe we could try G_THREAD_START
on these and G_THREAD_WAIT
before exit, i'll run some tests.
@MichaIng
🈯️ G_THREAD_*
@Fourdee Ah lol jep, totally forgot that we already have the function set for this 👍.
Jep should totally work with this. However thread internal output is missing then and still the question is, if there is really a boot time benefit with this. Needs to be tested on non-VM, I guess, where set_cpu/led really apply changes.
Btw. to have further boot ouput, remove quiet
from /etc/default/grub
boot line + update-grub
😉. Just to assure that really the rm -R /tmp/DietPi-PreBoot
is done after set_cpu/led has finished.
Testing:
root@DietPi:~# cat /var/tmp/dietpi/logs/dietpi-preboot.log
[ INFO ] DietPi-PreBoot | G_THREAD_START_0 | /DietPi/dietpi/func/dietpi-set_cpu
[ INFO ] DietPi-PreBoot | G_THREAD_START_1 | /DietPi/dietpi/func/dietpi-led_control 1
[ INFO ] DietPi-PreBoot | G_THREAD_WAIT_0 | /DietPi/dietpi/func/dietpi-set_cpu
[ INFO ] DietPi-PreBoot | G_THREAD_WAIT_1 | /DietPi/dietpi/func/dietpi-led_control 1
[ OK ] DietPi-PreBoot | G_THREAD: All threads finished
Marking as completed. Issue is now resolved by waiting for background threads to finish, before script exit.
Creating a bug report/issue:
Required Information:
Additional Information (if applicable):
Steps to reproduce:
Expected behaviour:
Actual behaviour:
Extra details: