NebraLtd / hm-pktfwd

Helium Miner Packet Forwarder
https://nebra.io/hnt
MIT License
12 stars 25 forks source link

2022.08.17.6 (083026e) breaks radio #108

Closed ifeign closed 1 year ago

ifeign commented 2 years ago
Screen Shot 2022-09-30 at 1 07 19 PM

Updating to this release breaks the radio. Rolling back to 2243cf1 immediately fixed this

Screen Shot 2022-09-30 at 1 10 18 PM
arduino43 commented 2 years ago

Same issue

VeniceInventors commented 1 year ago

Same here. How do you roll back? I can't find images of older releases

uros76 commented 1 year ago

Did you try here? https://hub.balena.io/organizations/nebraltd/fleets/helium-rak/releases Or don't you see old release on your fleet summary?

arduino43 commented 1 year ago

Did you try here? https://hub.balena.io/organizations/nebraltd/fleets/helium-rak/releases Or don't you see old release on your fleet summary?

Yes. This is still an issue..

VeniceInventors commented 1 year ago

I can see the list of older releases at https://hub.balena.io/organizations/nebraltd/fleets/helium-rak/releases but there is no link on the list items to select or download any of them, just a "copy to clipboard" link next to the commit column which copies e.g. "54727f46da89d4df26cc7d308a9b4873" for the 0.0.1+rev36 release. But then I don't know what to do with that string.

Turb0n00by commented 1 year ago

Just a random thought as I am catching up on these.. What happens if..

You select your device by the check box

Screenshot 2022-10-07 at 6 08 19 AM

Then select actions> Pin to release

Screenshot 2022-10-07 at 6 08 24 AM

From there I would be able to select previous image versions..

Screenshot 2022-10-07 at 6 08 31 AM
arduino43 commented 1 year ago

The original issue is for the most recent version. If someone doesn't know how to revert back to a later version, thats a different issue.

Crash0v3r1de commented 1 year ago

Just a random thought as I am catching up on these.. What happens if..

You select your device by the check box

Screenshot 2022-10-07 at 6 08 19 AM

Then select actions> Pin to release

Screenshot 2022-10-07 at 6 08 24 AM

From there I would be able to select previous image versions..

Screenshot 2022-10-07 at 6 08 31 AM

That is how you set a specific image version to be provisioned on your device. You can do it for your whole fleet under the version group box, click the drop down and pick what version you want instead of current release and it'll push the changes.

VeniceInventors commented 1 year ago

Just a random thought as I am catching up on these.. What happens if.. You select your device by the check box Then select actions> Pin to release From there I would be able to select previous image versions.. That is how you set a specific image version to be provisioned on your device. You can do it for your whole fleet under the version group box, click the drop down and pick what version you want instead of current release and it'll push the changes.

It doesn't give the option here: image seemingly because I recently forked the image and never had the previous releases.

Crash0v3r1de commented 1 year ago

Just a random thought as I am catching up on these.. What happens if.. You select your device by the check box Then select actions> Pin to release From there I would be able to select previous image versions.. That is how you set a specific image version to be provisioned on your device. You can do it for your whole fleet under the version group box, click the drop down and pick what version you want instead of current release and it'll push the changes.

It doesn't give the option here: image seemingly because I recently forked the image and never had the previous releases.

Your specific situation will be different.

You'll have to fork the last version through github on your computer: https://github.com/NebraLtd/helium-rak/commit/3dc3d7cee07ad5b9a45768fc9e1a072335b2e0cc

Then push it to your fleet via Balena CLI or you can also do it with git, either way you'll have to fork then push to your fleet.

VeniceInventors commented 1 year ago

Your specific situation will be different.

You'll have to fork the last version through github on your computer: 3dc3d7c

Then push it to your fleet via Balena CLI or you can also do it with git, either way you'll have to fork then push to your fleet.

Thanks Crash0v3r1de for the pointers! Being new to git I couldn't figure out how to fork the 3dc3d7c commit only, but by using my modified fork (where I reverted the changes in docker-compose.yml), and creating a new release with balena CLI this time, I managed to get an image of the previous release, and the diagnostic page now shows "2022.08.17.6 (2243cf1)" with the radio working.

While rolling back is a helpful workaround to be able to use the balena dashboard, it doesn't help solving the root cause of the issue. What would be the right place to go to try to track the changes that caused it to break in the first place? The only significant change in docker-compose.yml is "image: nebraltd/hm-pktfwd:f46721a". Does that mean the issue could be found in nebraltd/hm-pktfwd?

Crash0v3r1de commented 1 year ago

Your specific situation will be different. You'll have to fork the last version through github on your computer: 3dc3d7c Then push it to your fleet via Balena CLI or you can also do it with git, either way you'll have to fork then push to your fleet.

Thanks Crash0v3r1de for the pointers! Being new to git I couldn't figure out how to fork the 3dc3d7c commit only, but by using my modified fork (where I reverted the changes in docker-compose.yml), and creating a new release with balena CLI this time, I managed to get an image of the previous release, and the diagnostic page now shows "2022.08.17.6 (2243cf1)" with the radio working.

While rolling back is a helpful workaround to be able to use the balena dashboard, it doesn't help solving the root cause of the issue. What would be the right place to go to try to track the changes that caused it to break in the first place? The only significant change in docker-compose.yml is "image: nebraltd/hm-pktfwd:f46721a". Does that mean the issue could be found in nebraltd/hm-pktfwd?

Well this would be one way to report the issue to the dev (Nebra) and then they would dig into it and fix it then push the update however I'm going to guess Nebra does not pay attention to this repo since it's an automated repo for balena.

You'd have to go to the linked repo's like the pktfwd you mentioned or possibly the miner repo for nebra's main github and link an issue ticket to this one and hope they dig into it. Or you can wait for whenever anew version will be pushed to find out if it fixes the RAK issues or if it's still broken.

Either way you'll have to go directly to the main repo's of Nebra's software to report the issue as they seem to just not check this one which makes sense.

@shawaj (figure it might be worth tagging Aaron to see if he'll tell you the same thing)

shawaj commented 1 year ago

@Crash0v3r1de thanks for the ping. Have moved the issue to the correct repo.

Ping @KevinWassermann94 @MuratUrsavas

Seems this broke when @posterzh updated the config files about a month ago in relation to #103

shawaj commented 1 year ago

Not 100% sure this is the cause but..."gps_i2c_path": "/dev/i2c-1", was added back into the config files for sx1302 and I don't know why.

My guess is this isn't a rak specific issue but is actually happening for all sx1302/sx1303

If I remember correctly we removed the GPS stuff from the concentrator code as it isn't needed, but it's been a long time since I've looked at it TBH so might be wrong

If it's not this GPS thing my next guess would be that something in the configuration files is going beyond the limits of the sx1250 outputs or something.

Crash0v3r1de commented 1 year ago

Not 100% sure this is the cause but..."gps_i2c_path": "/dev/i2c-1", was added back into the config files for sx1302 and I don't know why.

My guess is this isn't a rak specific issue but is actually happening for all sx1302/sx1303

If I remember correctly we removed the GPS stuff from the concentrator code as it isn't needed, but it's been a long time since I've looked at it TBH so might be wrong

If it's not this GPS thing my next guess would be that something in the configuration files is going beyond the limits of the sx1250 outputs or something.

Thanks for migrating to the main repo. Not sure if logs and config files would help with this but I can send over whatever you'd want to look at.

shawaj commented 1 year ago

If you do have any logs from a unit with this issue, particularly from the packet forwarder container, that could be really helpful yes

shawaj commented 1 year ago

@VeniceInventors @Turb0n00by @ifeign @Crash0v3r1de @arduino43 @uros76

I managed to reproduce this issue, but the issue is actually that you are using a testnet version of the software that was never meant for production.

2022.08.17.6 is the latest production firmware.

2022.08.17.6-2 was never publicly released and had mangled config files. It only went to testnet.

2022.08.17.6-3 is the latest test version and works. But it hasn't been released to production yet either.

I've also just made a 2022.08.17.6-4 for testing which is identical to 2022.08.17.6-3 apart from the version number.

And the last update we pushed to the helium-rak fleet was 9th September. You should only use the officially released one at https://hub.balena.io/organizations/nebraltd/fleets/helium-rak and onboard your device by clicking Get Started not on Fork this fleet

Perhaps you have forked the fleet and are tracking a non-production version somewhere?

You can also see the latest official version at https://hub.balena.io/organizations/nebraltd/fleets/helium-rak/releases

Closing this issue as it's not an issue on our side.

shawaj commented 1 year ago

For your reference, the latest testnet compose files are always stored here on the master branch of the main repo - https://github.com/NebraLtd/helium-miner-software/tree/master/device-compose-files

The latest production ready compose files are on the production branch - https://github.com/NebraLtd/helium-miner-software/blob/production/device-compose-files

The helium-rak repo config file (https://github.com/NebraLtd/helium-rak) should always mirror the testnet version shown above, but not always.

As I said, best way is to just use the fleet as it is supposed to be used (using Get Started button) not forking fleet. Failing that, you should use the production device-compose-files from the above link

shawaj commented 1 year ago

@MuratUrsavas @KevinWassermann94 - no actual issue here just FYI ^^

Crash0v3r1de commented 1 year ago

@MuratUrsavas @KevinWassermann94 - no actual issue here just FYI ^^

Haven't had time to jump to the problem commit to grab logs but my hotspot did auto update to your latest commit and the radio status looks good now. image

We're pulling directly from the Github repo for the helium-rak balena fleet -> https://github.com/NebraLtd/helium-rak

shawaj commented 1 year ago

@Crash0v3r1de presumably you have forked the fleet and are updating it with your own workflows?

As if you were on the Balena hub fleet it never would have broken and wouldn't have just updated

Crash0v3r1de commented 1 year ago

@Crash0v3r1de presumably you have forked the fleet and are updating it with your own workflows?

As if you were on the Balena hub fleet it never would have broken and wouldn't have just updated

the only way for me to have control of my hotspot unfortunately.

shawaj commented 1 year ago

I'm thinking that since the repo https://github.com/NebraLtd/helium-rak is meant for production devices I might change the action to pull in the production docker compose instead of the testnet one.

And perhaps I'll also create a testnet branch there which has the latest testnet one too?

What do you think @Crash0v3r1de ?

Crash0v3r1de commented 1 year ago

I'm thinking that since the repo https://github.com/NebraLtd/helium-rak is meant for production devices I might change the action to pull in the production docker compose instead of the testnet one.

And perhaps I'll also create a testnet branch there which has the latest testnet one too?

What do you think @Crash0v3r1de ?

That would be super helpful for the folks who want the balena control over their devices.

shawaj commented 1 year ago

Ok will sort this out this week sometime.

I think that makes most sense and will avoid similar issues occurring in the future 👌

shawaj commented 1 year ago

Ok sorted that now @Crash0v3r1de - should be more stable now.

Master branch has the production config - https://github.com/NebraLtd/helium-rak/tree/master

But the testnet branch has the latest testing version - https://github.com/NebraLtd/helium-rak/tree/testnet

Haven't added any documentation about it but will do that soon

Crash0v3r1de commented 1 year ago

Ok sorted that now @Crash0v3r1de - should be more stable now.

Master branch has the production config - https://github.com/NebraLtd/helium-rak/tree/master

But the testnet branch has the latest testing version - https://github.com/NebraLtd/helium-rak/tree/testnet

Haven't added any documentation about it but will do that soon

You are the best thank you!!!

shawaj commented 1 year ago

No problem at all. Thanks to you and the others for bringing this to our attention 🙂