Small Improvements to Changelog Verification

ExtremeFiretop / MerlinAutoUpdate-Router

Merlin(A)uto(U)pdate is a Merlin router script which allows you to remotely identify a stable firmware update for an ASUS Merlin router, and automatically download and update via an unattended method directly from the router.

https://www.snbforums.com/threads/merlinau-v1-2-7-the-ultimate-firmware-auto-updater-amtm-addon.91326/

GNU General Public License v3.0

15 stars 1 forks source link

Small Improvements to Changelog Verification #301

Closed ExtremeFiretop closed 2 weeks ago

ExtremeFiretop commented 3 weeks ago

Description of the Bug/Issue:

When doing an offline firmware update this morning on my ASUS GT-BE98 Pro to the newest alpha build version 3006.102.2_alpha1 using MerlinAU, I noticed that it failed to find the correct changelog file even though the changelog file is present and valid in the directories as found below:

The source of the issue is this line in the changelog verification function:

release_version="$(Get_Custom_Setting "FW_New_Update_Notification_Vers")"

This line is what MerlinAU uses to determine which changelog to look for. As seen by the above screenshot it's looking for the NG file instead of the 3006 file on my 3006 router.

Possible Resolutions:

1. Different logic for offline changelog selection than production. (Based on installed firmware version instead of update firmware version)

My thoughts:
I dislike this idea., because some routers may some day move from the NG changelog to the 3006 changelog and the current production logic allows for this to work.

2. I could simply update the entry in the file, or disable the changelog verification.

My Thoughts:
I dislike this idea, I shouldn't ever need to update that settings file manually outside of testing, if I'm a new user, that entry will be TBD as seen above, now I highly doubt a new user would ever use the offline function as their first move, but we should still correctly account for this situation, as well as I shouldn't need to disable changelog verification to allow the flash to happen. Especially if a changelog is present and valid.

3. Update the offline logic to update the production file entries if it's used to do a firmware update. This would allow the script to find the correct changelog file without impacting production logic.

My thoughts:
I like this idea the best, so I've opted to update the file entries if a user triggers an offline update.

With the noticed solution implemented, re-running gave the expected results as seen below:

ExtremeFiretop commented 3 weeks ago

I decided to re-implement the post-flash status email for the offline feature. We removed it as apart of the fix for PR: https://github.com/ExtremeFiretop/MerlinAutoUpdate-Router/pull/283

At the time, I considered exporting the variable, but it opted to just remove the post-flash email since it was easier:

I'll have to adjust the logic, in testing I realize that the email won't have access to that variable sadly.

I'll have to export it? Or just not send the email. What do you think? @Martinski4GitHub

But considering now I'm exporting it anyways for the changelog above, I may as well re-implement the post-flash email.

Martinski4GitHub commented 2 weeks ago

...

The source of the issue is this line in the changelog verification function:

release_version="$(Get_Custom_Setting "FW_New_Update_Notification_Vers")"

This line is what MerlinAU uses to determine which changelog to look for. As seen by the above screenshot it's looking for the NG file instead of the 3006 file on my 3006 router.

...

3. Update the offline logic to update the production file entries if it's used to do a firmware update. This would allow the script to find the correct changelog file without impacting production logic.
* **My thoughts:**

* I like this idea the best, so I've opted to update the file entries if a user triggers an offline update.
Yeah, when doing an offline update with Alpha/Beta versions, there is no notification from the router itself (i.e. executing the webs_update.sh script) so the corresponding NVRAM vars are not yet set and, therefore, the config entries may be unset as well. Setting those config entries during the offline update looks like a good idea to work around the problem without affecting F/W updates for production releases (as you said).

Martinski4GitHub commented 2 weeks ago

I decided to re-implement the post-flash status email for the offline feature. We removed it as apart of the fix for PR: #283

At the time, I considered exporting the variable, but it opted to just remove the post-email flash since it was easier:

I'll have to adjust the logic, in testing I realize that the email won't have access to that variable sadly. I'll have to export it? Or just not send the email. What do you think? @Martinski4GitHub

But considering now I'm exporting it anyways for the changelog above, I may as well re-implement the post-flash email.

OK, understood. The post-reboot email notification serves as a confirmation when the F/W update has been completed successfully.

Martinski4GitHub commented 2 weeks ago

@ExtremeFiretop,

BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?

killall: afpd: no process killed
killall: cnid_metad: no process killed

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

ExtremeFiretop commented 2 weeks ago

@ExtremeFiretop,

BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?
killall: afpd: no process killed
killall: cnid_metad: no process killed
I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.

Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...

Not exactly sure if it's a bad thing, but I've noticed it as well.

It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

Martinski4GitHub commented 2 weeks ago

@ExtremeFiretop, BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?
killall: afpd: no process killed
killall: cnid_metad: no process killed
I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...
No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.

Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...

Not exactly sure if it's a bad thing, but I've noticed it as well.

It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:
/sbin/ejusb -1 0 -u 1 >/dev/null
The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:
/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

ExtremeFiretop commented 2 weeks ago

@ExtremeFiretop, BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?
killall: afpd: no process killed
killall: cnid_metad: no process killed
I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...
No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions. Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc... Not exactly sure if it's a bad thing, but I've noticed it as well. It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...
I'd suggest to try a simpler solution first:
/sbin/ejusb -1 0 -u 1 >/dev/null
The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:
/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1
As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router.

It's just much more chatty about what it's doing now. (Or not doing..)

I'm fine with that suggestion of suppression!

Martinski4GitHub commented 2 weeks ago

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions. Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc... Not exactly sure if it's a bad thing, but I've noticed it as well. It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:
/sbin/ejusb -1 0 -u 1 >/dev/null
The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:
/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1
As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.
As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router.

It's just much more chatty about what it's doing now. (Or not doing..)

I'm fine with that suggestion of suppression!

I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...

ExtremeFiretop commented 2 weeks ago

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions. Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc... Not exactly sure if it's a bad thing, but I've noticed it as well. It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:
/sbin/ejusb -1 0 -u 1 >/dev/null
The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:
/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1
As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.
As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router. It's just much more chatty about what it's doing now. (Or not doing..) I'm fine with that suggestion of suppression!
I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...

This is the most likely explanation. It makes me think of the weird behavior we saw with a Gnuton test router once...

It's been so long now that I don't remember the specifics, I'd need to go back through the Gnuton PR to see if I can spot the error again, but it felt like something was holding onto the USB still when we were trying to eject it.

It was then around the first time I suggested maybe going back to an inhouse solution. I'll see if I can dig up more info again

Martinski4GitHub commented 2 weeks ago

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions. Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc... Not exactly sure if it's a bad thing, but I've noticed it as well. It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:
/sbin/ejusb -1 0 -u 1 >/dev/null
The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:
/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1
As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.
As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router. It's just much more chatty about what it's doing now. (Or not doing..) I'm fine with that suggestion of suppression!
I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...
This is the most likely explanation. It makes me think of the weird behavior we saw with a Gnuton test router once...

It's been so long now that I don't remember the specifics, I'd need to go back through the Gnuton PR to see if I can spot the error again, but it felt like something was holding onto the USB still when we were trying to eject it.

It was then around the first time I suggested maybe going back to an inhouse solution. I'll see if I can dig up more info again

Yeah, it would be good to have more information as to when the msgs are triggered. It's not really an issue at all (so far) since the command has been working for our purposes, but I'm trying to understand what's happening to avoid it (if possible).

Martinski4GitHub commented 2 weeks ago

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router. It's just much more chatty about what it's doing now. (Or not doing..) I'm fine with that suggestion of suppression!

I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...

This is the most likely explanation. It makes me think of the weird behavior we saw with a Gnuton test router once...

It's been so long now that I don't remember the specifics, I'd need to go back through the Gnuton PR to see if I can spot the error again, but it felt like something was holding onto the USB still when we were trying to eject it.

Hey, bud. Apologies for the "radio silence." During my first week after coming back from vacation, I've been extremely busy with work, family & house stuff. In any case, I was thinking about the messages coming from the "/sbin/ejusb" command that were possibly triggered by a process accessing the USB-attached drive filesystem right at the moment when our script attempts to eject the drive, and one solution that could minimize the chances of such occurrence would be to remove all existing CRON jobs from 3rd-party add-ons immediately after stopping all Entware services, but before ejecting the USB drive. I've already written a function to do that and have been testing it this evening to make sure it works as intended. The idea is to stop processes from starting to execute around the time we're trying to eject the drive and initiate the actual F/W update process.

What do you think of such an approach?

ExtremeFiretop commented 2 weeks ago

Hey, bud. Apologies for the "radio silence." During my first week after coming back from vacation, I've been extremely busy with work, family & house stuff.

No worries, I've been busy as well :)

In any case, I was thinking about the messages coming from the "/sbin/ejusb" command that were possibly triggered by a process accessing the USB-attached drive filesystem right at the moment when our script attempts to eject the drive, and one solution that could minimize the chances of such occurrence would be to remove all existing CRON jobs from 3rd-party add-ons immediately after stopping all Entware services, but before ejecting the USB drive. I've already written a function to do that and have been testing it this evening to make sure it works as intended. The idea is to stop processes from starting to execute around the time we're trying to eject the drive and initiate the actual F/W update process.

What do you think of such an approach?

Is your idea to export the cron jobs, remove them, and reimport them afterwards? Or just confirm no cron jobs are executing around the time? I'm not sure how I feel about that approach, I've need to see what you mean to better judge the solution.

BTW: I know you had some contact with Jack in the past, any ideas when the dev versions will become main versions? Becoming tired of seeing this every weekend hahaha:

Martinski4GitHub commented 2 weeks ago

Hey, bud. Apologies for the "radio silence." During my first week after coming back from vacation, I've been extremely busy with work, family & house stuff.

No worries, I've been busy as well :)

In any case, I was thinking about the messages coming from the "/sbin/ejusb" command that were possibly triggered by a process accessing the USB-attached drive filesystem right at the moment when our script attempts to eject the drive, and one solution that could minimize the chances of such occurrence would be to remove all existing CRON jobs from 3rd-party add-ons immediately after stopping all Entware services, but before ejecting the USB drive. I've already written a function to do that and have been testing it this evening to make sure it works as intended. The idea is to stop processes from starting to execute around the time we're trying to eject the drive and initiate the actual F/W update process. What do you think of such an approach?

Is your idea to export the cron jobs, remove them, and reimport them afterwards? Or just confirm no cron jobs are executing around the time? I'm not sure how I feel about that approach, I've need to see what you mean to better judge the solution.

The idea is to remove all cron jobs created by 3rd-party addons from crontab a few seconds before the USB drive is ejected & the actual F/W update process begins. At that point, we want to eliminate the possibility of a cron job starting to execute and possibly accessing th USB drive filesystem when were about to eject the drive and begin the F/W update. The cron jobs will get restore after the router reboots so there's no need to do anything else (no "re-import").

I will submit a PR later in the evening so you can see what I mean and can test how it works. Right now, my wife & I are about to go out to do some shopping (groceries for the week and other stuff) and then will stop by my parents' home. I'll be offline until later today in the evening.

BTW: I know you had some contact with Jack in the past, any ideas when the dev versions will become main versions? Becoming tired of seeing this every weekend hahaha:

Jack Yaz has gone radio silent for almost a month now. Maybe he's on vacation (it's common in Europe to get a month vacation around this time). I'll ping him in a few days to check about the PRs.