ExtremeFiretop / MerlinAutoUpdate-Router

Merlin(A)uto(U)pdate is a Merlin router script which allows you to remotely identify a stable firmware update for an ASUS Merlin router, and automatically download and update via an unattended method directly from the router.
https://www.snbforums.com/threads/merlinau-v1-2-7-the-ultimate-firmware-auto-updater-amtm-addon.91326/
GNU General Public License v3.0
15 stars 1 forks source link

Minor change to the "Help" output #271

Closed Martinski4GitHub closed 2 months ago

Martinski4GitHub commented 2 months ago

Added the ".sh" file extension to the listed script commands in the "Help" output screen to make it clear that the extension is definitely needed as part of the script filename.

BEFORE: MerlinAU_HelpCmds_BEFORE

AFTER: MerlinAU_HelpCmds_AFTER

ExtremeFiretop commented 2 months ago

Added the ".sh" file extension to the listed script commands in the "Help" output screen to make it clear that the extension is definitely needed as part of the script filename.

Welcome back Martinski! I see you saw that development has resumed :P It's goooddd to be back. Hehe, Nice little pause there for a bit.

Nice catch btw, I like it!

Martinski4GitHub commented 2 months ago

Added the ".sh" file extension to the listed script commands in the "Help" output screen to make it clear that the extension is definitely needed as part of the script filename.

Welcome back Martinski! I see you saw that development has resumed :P It's goooddd to be back. Hehe, Nice little pause there for a bit.

It was nice to have a little "pause" for a bit; although I was busy for a few days working with Viktor on BackupMon, and then addressing the "missing vertical scrollbar" issue found in some add-ons webGUI when loaded on the latest 386.14 F/W release.

Nice catch btw, I like it!

It's a very minor detail but there's always a novice user who may try the commands without the ".sh" file extension and then complain that "it doesn't work." :>)

ExtremeFiretop commented 2 months ago

It was nice to have a little "pause" for a bit; although I was busy for a few days working with Viktor on BackupMon, and then addressing the "missing vertical scrollbar" issue found in some add-ons webGUI when loaded on the latest 386.14 F/W release.

Agreed, and I heard the rumbles of the development with you and Viktor but only from an arms reach, by the time I came in to poke you had already fixed it haha!

Although Viktor had some other work for me to pickup as well for RTRMON, so all the help goes around lol! I figured we should be safe to start development again considering it's been over 2 weekends, and we had a major firmware release last weekend, so I expect people to be reading the forums and updating their addons at the same time.

It's a very minor detail but there's always a novice user who may try the commands without the ".sh" file extension and then complain that "it doesn't work." :>)

Yeah very true, it's a minor detail but I've learned that some users really need the instructions to be clear. (Which is fair) I have an update coming up in a PR shortly with some clarifications as well.

I wanted to discuss with you a situation I noticed using the eject usb command, I think it's time to re-evaluate that command. I was willing to ignore the warning if it was run without a USB inserted, but it seems to not be "aggresive" enough when the USB is still in use.

I saw some logs from a user in the forums which did his update over Tailscale. That feature worked and kept his connection alive, but due to that I believe the USB stayed in use and the eject USB command in the system logs was just failing over and over again endlessly.

(I'll try to find the logs again, but something along the lines of "The USB device is busy, waiting to try again") And it would just queue itself over and over again, the user finally manually rebooted, which according to the logs, released the USB and actually triggered and the update go through and finish before the reboot happened. I guess the script caught the release of the USB before the router fully got to shutdown.

Martinski4GitHub commented 2 months ago

It was nice to have a little "pause" for a bit; although I was busy for a few days working with Viktor on BackupMon, and then addressing the "missing vertical scrollbar" issue found in some add-ons webGUI when loaded on the latest 386.14 F/W release.

Agreed, and I heard the rumbles of the development with you and Viktor but only from an arms reach, by the time I came in to poke you had already fixed it haha!

Although Viktor had some other work for me to pickup as well for RTRMON, so all the help goes around lol!

Yeah, I've seen all the messages about that, but a lot of it went over my head because I have no real context of the situation (I've never used RTRMON and don't have a BE-class router). But it looks like you pretty much took care of it.

I figured we should be safe to start development again considering it's been over 2 weekends, and we had a major firmware release last weekend, so I expect people to be reading the forums and updating their addons at the same time.

Fully agree. With the latest two F/W versions (386.x & 388.x) being released recently, users will be updating their routers within the next few days, or weeks, depending on their cron schedule & postponement days.

It's a very minor detail but there's always a novice user who may try the commands without the ".sh" file extension and then complain that "it doesn't work." :>)

Yeah very true, it's a minor detail but I've learned that some users really need the instructions to be clear. (Which is fair) I have an update coming up in a PR shortly with some clarifications as well.

I wanted to discuss with you a situation I noticed using the eject usb command, I think it's time to re-evaluate that command. I was willing to ignore the warning if it was run without a USB inserted, but it seems to not be "aggresive" enough when the USB is still in use.

I saw some logs from a user in the forums which did his update over Tailscale. That feature worked and kept his connection alive, but due to that I believe the USB stayed in use and the eject USB command in the system logs was just failing over and over again endlessly.

Yeah, it's a tricky situation trying to keep TailScale "alive" when eventually the USB-attached drive must be ejected/removed for the F/W flash to be completed successfully. So I agree that the scenario must be re-evaluated & likely some changes will be required to handle things more "gracefully."

I've never used TailScale myself so I am not familiar at all with how it works; all I know is that the binaries are Entware packages.

(I'll try to find the logs again, but something along the lines of "The USB device is busy, waiting to try again") And it would just queue itself over and over again, the user finally manually rebooted, which according to the logs, released the USB and actually triggered and the update go through and finish before the reboot happened. I guess the script caught the release of the USB before the router fully got to shutdown.

Yeah, it may help to review the user logs more closely.

ExtremeFiretop commented 2 months ago

Yeah, it may help to review the user logs more closely.

I will need to message the user again since the URL expired to his logs on dropbox.

But I did copy and paste part of it in my conversation with him in our chat history, so I have this part which is the part of value I think:

ul 22 12:20:48 [MerlinAU.sh] 25376: The email notification was sent successfully [START_FW_UPDATE_STATUS].
Jul 22 12:20:48 [MerlinAU.sh] 25376: Looking for Entware services...
Jul 22 12:20:48 [MerlinAU.sh] 25376: Skipping S06tailscaled stop call...
Jul 22 12:20:48 [MerlinAU.sh] 25376: Skipped S06tailscaled stop call.
Jul 22 12:20:48 [MerlinAU.sh] 25376: Post-update email notification hook already exists in '/jffs/scripts/services-start' script.
Jul 22 12:20:48 [MerlinAU.sh] 25376: Flashing RT-AX86U_PRO_3004_388.8_0_nand_squashfs.pkgtb... Please wait for reboot in about 4 minutes or less.
Jul 22 12:20:49 ejusb[9275]: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:20:49 avahi-daemon[3301]: Files changed, reloading.
Jul 22 12:20:49 avahi-daemon[3301]: Loading service file /tmp/avahi/services/mt-daap.service.
Jul 22 12:20:49 avahi-daemon[3301]: Alias name RT-AX86U_Pro: avahi_server_add_cname failure: The requested operation is invalid because redundant
Jul 22 12:20:49 iTunes: daemon is stopped
Jul 22 12:20:49 FTP_Server: daemon is stopped
Jul 22 12:20:49 Samba_Server: smb daemon is stopped
Jul 22 12:20:50 avahi-daemon[3301]: Got SIGTERM, quitting.
Jul 22 12:20:50 avahi-daemon[3301]: Leaving mDNS multicast group on interface br2.IPv4 with address 192.168.102.1.
Jul 22 12:20:50 avahi-daemon[3301]: Leaving mDNS multicast group on interface br1.IPv4 with address 192.168.101.1.
Jul 22 12:20:50 avahi-daemon[3301]: Leaving mDNS multicast group on interface br0.IPv4 with address 192.168.47.1.
Jul 22 12:20:50 avahi-daemon[3301]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.1.1.
Jul 22 12:20:50 avahi-daemon[3301]: avahi-daemon 0.8 exiting.
Jul 22 12:20:51 rc_service: skip the event: restart_httpd.
Jul 22 12:20:54 Timemachine: daemon is stopped
Jul 22 12:20:54 avahi-daemon[9364]: Found user 'nobody' (UID 65534) and group 'nobody' (GID 65534).
Jul 22 12:20:54 avahi-daemon[9364]: Successfully dropped root privileges.
Jul 22 12:20:54 avahi-daemon[9364]: avahi-daemon 0.8 starting up.
Jul 22 12:20:54 avahi-daemon[9364]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
Jul 22 12:20:54 avahi-daemon[9364]: Loading service file /tmp/avahi/services/alexa.service.
Jul 22 12:20:54 avahi-daemon[9364]: Loading new alias name RT-AX86U_Pro.
Jul 22 12:20:54 avahi-daemon[9364]: Joining mDNS multicast group on interface br2.IPv4 with address 192.168.102.1.
Jul 22 12:20:54 avahi-daemon[9364]: New relevant interface br2.IPv4 for mDNS.
Jul 22 12:20:54 avahi-daemon[9364]: Joining mDNS multicast group on interface br1.IPv4 with address 192.168.101.1.
Jul 22 12:20:54 avahi-daemon[9364]: New relevant interface br1.IPv4 for mDNS.
Jul 22 12:20:54 avahi-daemon[9364]: Joining mDNS multicast group on interface br0.IPv4 with address 192.168.47.1.
Jul 22 12:20:54 avahi-daemon[9364]: New relevant interface br0.IPv4 for mDNS.
Jul 22 12:20:54 avahi-daemon[9364]: Joining mDNS multicast group on interface lo.IPv4 with address 127.0.1.1.
Jul 22 12:20:54 avahi-daemon[9364]: New relevant interface lo.IPv4 for mDNS.
Jul 22 12:20:54 avahi-daemon[9364]: Network interface enumeration completed.
Jul 22 12:20:54 avahi-daemon[9364]: Registering new address record for 192.168.102.1 on br2.IPv4.
Jul 22 12:20:54 avahi-daemon[9364]: Registering new address record for 192.168.101.1 on br1.IPv4.
Jul 22 12:20:54 avahi-daemon[9364]: Registering new address record for 192.168.47.1 on br0.IPv4.
Jul 22 12:20:54 avahi-daemon[9364]: Registering new address record for 127.0.1.1 on lo.IPv4.
Jul 22 12:20:54 avahi-daemon[9364]: Registering new address record for 127.0.0.1 on lo.IPv4.
Jul 22 12:20:54 avahi-daemon[9364]: Server startup complete. Host name is RT-AX86U_Pro.local. Local service cookie is 3097265168.
Jul 22 12:20:54 avahi-daemon[9364]: Alias name RT-AX86U_Pro: avahi_server_add_cname failure: The requested operation is invalid because redundant
Jul 22 12:20:55 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:20:55 ddns: IP address, server and hostname have not changed since the last update.
Jul 22 12:20:55 avahi-daemon[9364]: Service "RT-AX86U_Pro" (/tmp/avahi/services/alexa.service) successfully established.
Jul 22 12:21:17 rc_service: skip the event: restart_leds.
Jul 22 12:21:18 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:19 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:19 rc_service: service 9901:notify_rc restart_leds
Jul 22 12:21:19 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:20 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:21 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:22 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:23 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:24 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:25 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:26 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:27 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:28 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:29 ejusb: USB partition busy - will unmount ASAP from /tmp/mnt/AUBREYEXT4
Jul 22 12:21:29 rc_service: ejusb 9275:notify_rc restart_nasapps
Jul 22 12:21:29 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:29 kernel: usb 1-2: USB disconnect, device number 2
Jul 22 12:21:34 rc_service: skip the event: restart_leds.
Jul 22 12:21:36 rc_service: service 10168:notify_rc restart_leds
Jul 22 12:21:36 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:51 rc_service: skip the event: restart_leds.
Jul 22 12:21:53 rc_service: service 10423:notify_rc restart_leds
Jul 22 12:21:53 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:59 rc_service: skip the event: restart_nasapps.
Jul 22 12:21:59 rc_service: service 10515:notify_rc restart_leds
Jul 22 12:21:59 rc_service: waitting "restart_leds" via ...
Jul 22 12:22:08 rc_service: skip the event: restart_leds.
Jul 22 12:22:14 rc_service: skip the event: restart_leds.
Jul 22 12:22:16 rc_service: httpd 2475:notify_rc stop_upgrade
Jul 22 12:22:16 rc_service: waitting "restart_leds" via ...
Jul 22 12:22:31 rc_service: skip the event: stop_upgrade.
Jul 22 12:23:20 rc_service: httpd 2475:notify_rc start_upgrade
Jul 22 12:23:20 rc_service: waitting "restart_leds" via ...
Jul 22 12:24:20 rc_service: skip the event: start_upgrade.
Jul 22 12:24:20 rc_service: httpd 2475:notify_rc stop_upgrade
Jul 22 12:24:20 rc_service: waitting "restart_leds" via ...
Jul 22 12:24:30 rc_service: skip the event: stop_upgrade.
Jul 22 12:24:30 rc_service: httpd 2475:notify_rc start_upgrade
Jul 22 12:24:30 rc_service: waitting "restart_leds" via ...
Jul 22 12:25:30 rc_service: skip the event: start_upgrade.
Jul 22 12:25:30 rc_service: httpd 2475:notify_rc stop_upgrade
Jul 22 12:25:30 rc_service: waitting "restart_leds" via ...
Jul 22 12:25:40 rc_service: skip the event: stop_upgrade.
Jul 22 12:25:40 rc_service: httpd 2475:notify_rc start_upgrade
Jul 22 12:25:40 rc_service: waitting "restart_leds" via ...
Jul 22 12:26:20 rc_service: service 11714:notify_rc reboot
Jul 22 12:26:20 rc_service: waitting "restart_leds" via ...
Jul 22 12:26:35 rc_service: skip the event: reboot.
Jul 22 12:26:40 rc_service: skip the event: start_upgrade.
Jul 22 12:26:40 rc_service: httpd 2475:notify_rc stop_upgrade
Jul 22 12:26:40 rc_service: waitting "restart_leds" via ...
Jul 22 12:26:50 rc_service: skip the event: stop_upgrade.
Martinski4GitHub commented 2 months ago

Yeah, it may help to review the user logs more closely.

I will need to message the user again since the URL expired to his logs on dropbox.

But I did copy and paste part of it in my conversation with him in our chat history, so I have this part which is the part of value I think:

ul 22 12:20:48 [MerlinAU.sh] 25376: The email notification was sent successfully [START_FW_UPDATE_STATUS].
...
... [snipped]
...
Jul 22 12:26:50 rc_service: skip the event: stop_upgrade.

Can you also ask the user to send you the MerlinAU log file for that specific run? I'd like to see both logs and try to collate the data/events related to the F/W flash.

ExtremeFiretop commented 2 months ago

Yeah, it may help to review the user logs more closely.

I will need to message the user again since the URL expired to his logs on dropbox. But I did copy and paste part of it in my conversation with him in our chat history, so I have this part which is the part of value I think:

ul 22 12:20:48 [MerlinAU.sh] 25376: The email notification was sent successfully [START_FW_UPDATE_STATUS].
...
... [snipped]
...
Jul 22 12:26:50 rc_service: skip the event: stop_upgrade.

Can you also ask the user to send you the MerlinAU log file for that specific run? I'd like to see both logs and try to collate the data/events related to the F/W flash.

I actually have those already, but only in screenshot form from the user now. He used our "view log file" feature!

One momento

ExtremeFiretop commented 2 months ago

https://imgur.com/a/gxb04oc#nqR7rHe

Both screenshots at that location

Martinski4GitHub commented 2 months ago

https://imgur.com/a/gxb04oc#nqR7rHe

Both screenshots at that location

OK, got it, thanks. Next time, can you request the actual log files? I like to review & work with the actual text files to search for specific "words" or "events." Screenshots of logs are not conducive to do that kind of data collation.

ExtremeFiretop commented 2 months ago

https://imgur.com/a/gxb04oc#nqR7rHe Both screenshots at that location

OK, got it, thanks. Next time, can you request the actual log files? I like to review & work with the actual text files to search for specific "words" or "events." Screenshots of logs are not conducive to do that kind of data collation.

Yeah thats no problem, it's all I have for now though to work with unfortunately.

His reported issue was that he would try to run MerlinAU over tailscale, it would get the flashing steps, and then "terminate" and not actually update, he would then launch everything all over again and try again. etc he would never disconnect, but it wouldn't update.

Here is the screenshots of his "terminated" screen: https://imgur.com/a/g0lJZN2#4GE19OD I think it's either the eject USB command not being aggresive enough when we need it too, or we might need to consider running a stop of tailscale once the flash is started with nohup, but up until the nohup, if he disconnects then the script terminates "early" which is why we implemented that with Tailscale skip in the first place. (So they could reach the nohup step)

Martinski4GitHub commented 2 months ago

@ExtremeFiretop,

My "Higher Power" just summoned me :>)

Her laptop PC has been acting "funny" lately, and now she needs me to look at something... At some point, I'll likely need to reformat the main drive and re-install the OS.

Anyway, Hold my Beer!! :>)

ExtremeFiretop commented 2 months ago

@ExtremeFiretop,

My "Higher Power" just summoned me :>)

Her laptop PC has been acting "funny" lately, and now she needs me to look at something... At some point, I'll likely need to reformat the main drive and re-install the OS.

Anyway, Hold my Beer!! :>)

Hahaha enjoy the troubleshooting!

Windows amiright? As a Workstation Engineer for a good few years I know all the fun of the famous "acting funny" report 😁

ExtremeFiretop commented 2 months ago

Anyway, Hold my Beer!! :>)

This beer be cold, don't be gone too long or you won't find it when you come back! 😜

Martinski4GitHub commented 2 months ago

@ExtremeFiretop, My "Higher Power" just summoned me :>) Her laptop PC has been acting "funny" lately, and now she needs me to look at something... At some point, I'll likely need to reformat the main drive and re-install the OS. Anyway, Hold my Beer!! :>)

Hahaha enjoy the troubleshooting!

Windows amiright? As a Workstation Engineer for a good few years I know all the fun of the famous "acting funny" report 😁

Yep, it's a ~4-year-old Windows 10 HP laptop. It has been working well & a solid performer all these years; it's only during the last week that the "funny" issues started. I ran a disk check and found several bad sectors. I'll just take out the disk drive, do a full disk check, reformat & re-install the OS. Might as well do that to have a "clean" drive, start with a fresh OS, and re-install all the latest device drivers.

The good thing is that all her important personal documents/files are stored in one of NAS drives, and the PCs are backed up once a week, so there's really nothing for me to copy over. In the meantime, she borrowed one of my spare laptops.

Martinski4GitHub commented 2 months ago

https://imgur.com/a/gxb04oc#nqR7rHe Both screenshots at that location

OK, got it, thanks. Next time, can you request the actual log files? I like to review & work with the actual text files to search for specific "words" or "events." Screenshots of logs are not conducive to do that kind of data collation.

Yeah thats no problem, it's all I have for now though to work with unfortunately.

His reported issue was that he would try to run MerlinAU over tailscale, it would get the flashing steps, and then "terminate" and not actually update, he would then launch everything all over again and try again. etc he would never disconnect, but it wouldn't update.

OK, just to be crystal clear about the scenario. The user has a "LOCAL" ASUS router and a "REMOTE" ASUS router, with both routers (with their respective LAN subnets) being part of the TailScale "network."

For the F/W flash, the user has a local PC (connected to LOCAL router) and using TailScale connects to the REMOTE router via SSH terminal session. The user starts MerlinAU on the REMOTE router to interactively perform the F/W flash onto this REMOTE router. The initial problem was that before the actual flash was started, the script was, by design, stopping all Entware services (including TailScale) which effectively was terminating the connection to the REMOTE router, and this resulted in the script never sending the actual curl command to flash the F/W.

Given the above, the current solution is not to stop the TailScale service on the REMOTE router.

Do I understand the scenario correctly? I just want to make sure I'm not missing any important factor here.

Here is the screenshots of his "terminated" screen: https://imgur.com/a/g0lJZN2#4GE19OD I think it's either the eject USB command not being aggresive enough when we need it too, or we might need to consider running a stop of tailscale once the flash is started with nohup, but up until the nohup, if he disconnects then the script terminates "early" which is why we implemented that with Tailscale skip in the first place. (So they could reach the nohup step)

Yes, I agree. To successfully eject the USB-attached drive and perform the F/W flash, all Entware services must be terminated. I don't think there's way around that, but we can delay stopping TailScale until the very last second, right after the curl command is sent. However, it's possible that there might be a "race condition" between the F/W flash starting and stopping the TailScale service immediately after, so this needs to be tested well to see if it works IRL.

ExtremeFiretop commented 2 months ago

Found weird "bug" with the _Set_FW_UpdateZIPDirectoryPath function:

image

It wasn't working for me at all:

image

Where is the " echo "The directory path for the F/W ZIP file was updated successfully." message?

I entered a valid destination different than it was originally set at, and it just threw me back to the main menu over and over again.

I decided I'd add a pause of 20 seconds between these 2 lines:

      fi
   done

   if [ "$newZIP_BaseDirPath" != "$FW_ZIP_BASE_DIR" ] && [ -d "$newZIP_BaseDirPath" ]
   then

At about line 1258.

Next run, it correctly showed the message, but then crashed with that error

image

ExtremeFiretop commented 2 months ago

https://imgur.com/a/gxb04oc#nqR7rHe Both screenshots at that location

OK, got it, thanks. Next time, can you request the actual log files? I like to review & work with the actual text files to search for specific "words" or "events." Screenshots of logs are not conducive to do that kind of data collation.

Yeah thats no problem, it's all I have for now though to work with unfortunately. His reported issue was that he would try to run MerlinAU over tailscale, it would get the flashing steps, and then "terminate" and not actually update, he would then launch everything all over again and try again. etc he would never disconnect, but it wouldn't update.

OK, just to be crystal clear about the scenario. The user has a "LOCAL" ASUS router and a "REMOTE" ASUS router, with both routers (with their respective LAN subnets) being part of the TailScale "network."

For the F/W flash, the user has a local PC (connected to LOCAL router) and using TailScale connects to the REMOTE router via SSH terminal session. The user starts MerlinAU on the REMOTE router to interactively perform the F/W flash onto this REMOTE router. The initial problem was that before the actual flash was started, the script was, by design, stopping all Entware services (including TailScale) which effectively was terminating the connection to the REMOTE router, and this resulted in the script never sending the actual curl command to flash the F/W.

Given the above, the current solution is not to stop the TailScale service on the REMOTE router.

Do I understand the scenario correctly? I just want to make sure I'm not missing any important factor here.

Here is the screenshots of his "terminated" screen: https://imgur.com/a/g0lJZN2#4GE19OD I think it's either the eject USB command not being aggresive enough when we need it too, or we might need to consider running a stop of tailscale once the flash is started with nohup, but up until the nohup, if he disconnects then the script terminates "early" which is why we implemented that with Tailscale skip in the first place. (So they could reach the nohup step)

Yes, I agree. To successfully eject the USB-attached drive and perform the F/W flash, all Entware services must be terminated. I don't think there's way around that, but we can delay stopping TailScale until the very last second, right after the curl command is sent. However, it's possible that there might be a "race condition" between the F/W flash starting and stopping the TailScale service immediately after, so this needs to be tested well to see if it works IRL.

This is all correct.

I saw 3 possible solutions to this problem:

  1. Delay the stop tailscale until the nohup step has started, we could potentially add a timer in the nohup step so it doesn't race the stopping of Tailscale.
  2. Find a more aggressive method to unmount the USB before the flash and force whatever may be running on Entware at the time to get booted, maybe return to your function to do the unmount.
  3. Just say forget the unmount of the USB if your using this feature, skip the ejectusb command and let the flash start with the USB connected and the service running. Would be a higher "risk" move, but would also be a last course of action if the other 2 don't work.
ExtremeFiretop commented 2 months ago

Found weird "bug" with the _Set_FW_UpdateZIPDirectoryPath function:

image

It wasn't working for me at all:

image

Where is the " echo "The directory path for the F/W ZIP file was updated successfully." message?

I entered a valid destination different than it was originally set at, and it just threw me back to the main menu over and over again.

I decided I'd add a pause of 20 seconds between these 2 lines:

      fi
   done

   if [ "$newZIP_BaseDirPath" != "$FW_ZIP_BASE_DIR" ] && [ -d "$newZIP_BaseDirPath" ]
   then

At about line 1258.

Next run, it correctly showed the message, but then crashed with that error

image

How is the Set_FW_UpdateZIP_DirectoryPath function updating the $FW_ZIP_BASE_DIR value in the same run? I only set that value being set once at the very top of the script.

ExtremeFiretop commented 2 months ago

I answered my own question, it gets updated in the Update_Custom_Settings function when we do an update to the file. Maybe this error was unrelated to MerlinAU and just happened to kill/stop MerlinAU in it's tracks when it came up.

Martinski4GitHub commented 2 months ago

Yes, I agree. To successfully eject the USB-attached drive and perform the F/W flash, all Entware services must be terminated. I don't think there's way around that, but we can delay stopping TailScale until the very last second, right after the curl command is sent. However, it's possible that there might be a "race condition" between the F/W flash starting and stopping the TailScale service immediately after, so this needs to be tested well to see if it works IRL.

This is all correct.

I saw 3 possible solutions to this problem:

1. Delay the stop tailscale until the nohup step has started, we could potentially add a timer in the nohup step so it doesn't race the stopping of Tailscale.

2. Find a more aggressive method to unmount the USB before the flash and force whatever may be running on Entware at the time to get booted, maybe return to your function to do the unmount.

3. Just say forget the unmount of the USB if your using this feature, skip the ejectusb command and let the flash start with the USB connected and the service running. Would be a higher "risk" move, but would also be a last course of action if the other 2 don't work.

I have a solution that should address the issue of stopping TailScale and ejecting the USB-attached drive before sending the actual curl command to flash the F/W image.

While I was busy rebuilding my wife's laptop main drive & reinstalling all device drives & apps, I had some time to think about this problem and run the various scenarios in my head, and a simpler solution just occurred to me: allow the script to continue executing in the background even after the remote connection has been terminated. This addresses the actual root cause. See PR #273.

ExtremeFiretop commented 2 months ago

Yes, I agree. To successfully eject the USB-attached drive and perform the F/W flash, all Entware services must be terminated. I don't think there's way around that, but we can delay stopping TailScale until the very last second, right after the curl command is sent. However, it's possible that there might be a "race condition" between the F/W flash starting and stopping the TailScale service immediately after, so this needs to be tested well to see if it works IRL.

This is all correct. I saw 3 possible solutions to this problem:

1. Delay the stop tailscale until the nohup step has started, we could potentially add a timer in the nohup step so it doesn't race the stopping of Tailscale.

2. Find a more aggressive method to unmount the USB before the flash and force whatever may be running on Entware at the time to get booted, maybe return to your function to do the unmount.

3. Just say forget the unmount of the USB if your using this feature, skip the ejectusb command and let the flash start with the USB connected and the service running. Would be a higher "risk" move, but would also be a last course of action if the other 2 don't work.

I have a solution that should address the issue of stopping TailScale and ejecting the USB-attached drive before sending the actual curl command to flash the F/W image.

While I was busy rebuilding my wife's laptop main drive & reinstalling all device drives & apps, I had some time to think about this problem and run the various scenarios in my head, and a simpler solution just occurred to me: allow the script to continue executing in the background even after the remote connection has been terminated. This addresses the actual root cause. See PR #273.

I hope the wifes laptop was a victory as well!

ExtremeFiretop commented 2 months ago

Synced with Gnuton in commit: https://github.com/ExtremeFiretop/MerlinAutoUpdate-Router/pull/186/commits/7afe01fba3920742b6eb6ee8a7f95cbc3bcf2930