DeskPi-Team / super6c

Super6c stands for Super 6 CM4 Cluster.
MIT License
70 stars 4 forks source link

Power up a single halted module (from full shutdown) #10

Closed avluis closed 1 year ago

avluis commented 1 year ago

Currently researching how possible this is; I am looking to be able to power a module back up without having to shut down all other modules first to do so. It seems GPIO18 is available via the 12 pin breakout connector next to each compute module (J2 for example next to slot # 1 and J15(B-E) for slots # 2 - # 6) so the dream would be to be able to remap GPIO18 to handle power on. This is typically done via GLOBAL_EN (pulled low) instead.

Would it be possible to make changes to power management to allow the following:

Please Note! GLOBAL_EN cannot be pulled low when it is awake as it will power it off; therefore RUN_PG must be running to only allow GLOBAL_EN to be pulled low when RUN_PG is also low (CM4 IO board behavior).

Related ref: https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#configuration-properties


Random tidbit; it's pretty neat that we're able to use the compute module in slot 1 to update the bootloader for the rest of the modules by swapping to the respective USB header; if this board receives a future revision, please implement a USB switch that hard wires this instead and allows selecting which compute module we are talking to (via software or onboard switch).

sgreiner commented 1 year ago

I noticed shorting the 5V_CM4 Pin to GND resets the CM4 (does not matter whether it was running or not). No idea if this is intended, I thought it is a 5V fan header, maybe we should not short it, but it works :-/ Edit: I need to correct myself, this only works if the nPWR_LED is shining red (it was powered on at least once)

Some official response would be nice on this topic @yoyojacky

avluis commented 1 year ago

I noticed shorting the 5V_CM4 Pin to GND resets the CM4 (does not matter whether it was running or not). No idea if this is intended, I thought it is a 5V fan header, maybe we should not short it, but it works :-/

Checks out; that would be the primary power delivery to that board... -- can you confirm it only resets that specific slot and not the others, for science? "Correct" method to perform this test is to make use of a current sinking resistor but your method works as well 🤣

Edit: I need to correct myself, this only works if the nPWR_LED is shining red (it was powered on at least once)

Ah yes, further confirmation that power management is still in charge there so I have some hope!

sgreiner commented 1 year ago

Checks out; that would be the primary power delivery to that board... -- can you confirm it only resets that specific slot and not the others, for science?

"Correct" method to perform this test is to make use of a current sinking resistor but your method works as well 🤣

It only draws all the power from one specific module and therefore it gets reset. The module's electronics "survived" it whenever I did it (i used it often, i was flashing different firmware from module 1 to the others by usb to get nvme booting to work, and did not want to reset module 1 all the time)

Edit: I need to correct myself, this only works if the nPWR_LED is shining red (it was powered on at least once)

Ah yes, further confirmation that power management is still in charge there so I have some hope!

Without official comment we cannot consider this a save method, I think it is a risky hardware glitch by creating a short circuit for a very short time (maybe i shorted it for 0.1s at most).

If you find out more I would be very interested. I am not so experienced with electronics/pcbs unfortunately.

avluis commented 1 year ago

It only draws all the power from one specific module and therefore it gets reset. The module's electronics "survived" it whenever I did it (i used it often, i was flashing different firmware from module 1 to the others by usb to get nvme booting to work, and did not want to reset module 1 all the time)

When I was working on nvme boot I ended up updating the bootloader (and making the needed changes to boot.conf) from an external host then setting up all the jumpers for the rest of the slots so it only took a single reset of the entire board (after shutting down slot 1)

Without official comment we cannot consider this a save method, I think it is a risky hardware glitch by creating a short circuit for a very short time (maybe i shorted it for 0.1s at most).

Correct, unless you actually cut the power delivery circuit (which power management ICs/circuits should be able to) any other methods (including shorting any of the V+ pins to ground) is considered ill-advised as something else will suffer for it (typically your power delivery circuit); for now as you say it is best to wait on an official response.

yoyojacky commented 1 year ago

I noticed shorting the 5V_CM4 Pin to GND resets the CM4 (does not matter whether it was running or not). No idea if this is intended, I thought it is a 5V fan header, maybe we should not short it, but it works :-/ Edit: I need to correct myself, this only works if the nPWR_LED is shining red (it was powered on at least once)

Some official response would be nice on this topic @yoyojacky

Hi Bro, I don't think this is a good idea to short 5v and GND, I have another solution to control the CM4s , internal there are connect the same bus, like a switch, you just enable Eth0 and setting a subnet address such as 10.0.0.x /24 for each one ,and install ansible application, using ansible you can control each one individuly, and you can also using other cluster management application as well. you can make your #1 CM4 as a master host to control other CM4s, there can be 5 nodes around your master host, that will be easy to control each one of them.

yoyojacky commented 1 year ago

It only draws all the power from one specific module and therefore it gets reset. The module's electronics "survived" it whenever I did it (i used it often, i was flashing different firmware from module 1 to the others by usb to get nvme booting to work, and did not want to reset module 1 all the time)

When I was working on nvme boot I ended up updating the bootloader (and making the needed changes to boot.conf) from an external host then setting up all the jumpers for the rest of the slots so it only took a single reset of the entire board (after shutting down slot 1)

Without official comment we cannot consider this a save method, I think it is a risky hardware glitch by creating a short circuit for a very short time (maybe i shorted it for 0.1s at most).

Correct, unless you actually cut the power delivery circuit (which power management ICs/circuits should be able to) any other methods (including shorting any of the V+ pins to ground) is considered ill-advised as something else will suffer for it (typically your power delivery circuit); for now as you say it is best to wait on an official response.

yes, that's right , DO NOT LET 5V SHORT WITH GND! It may damage your device or cause a fire..... so if you want to reset your cm4 , why don't you use ssh to control them to reset safely?

yoyojacky commented 1 year ago

if you want to power up a single halted module, you need to reboot all of them at the same time, it can not just power up a single halted module ,sorry for that . you can just press reset button or boot the them with physical reboot .

avluis commented 1 year ago

Before this gets off topic; do not short out any of your supply rails (5V, 3.3V; doesn't matter) -- not the focus here though.

Hi Bro, I don't think this is a good idea to short 5v and GND, I have another solution to control the CM4s , internal there are connect the same bus, like a switch, you just enable Eth0 and setting a subnet address such as 10.0.0.x /24 for each one ,and install ansible application, using ansible you can control each one individuly, and you can also using other cluster management application as well. you can make your #1 CM4 as a master host to control other CM4s, there can be 5 nodes around your master host, that will be easy to control each one of them.

Ansible is certainly a way to control multiple nodes; also not the focus here as you won't be powering up nodes that are fully shut down this way either (but this is really the way you want to go for this board; it screams Ansible honestly!) Something that did come to mind was WoL but unfortunately Raspberry Pi doesn't support this (and by extension Compute Modules do not either); only way to implement that would be if the switch itself did (by listening to the relevant packet and triggering a power on for the assigned module, but we would need to be able to manage the switch for this; not possible in its current mode).

With that said; appreciate the input above -- what this issue/thread is asking is not for the obvious options (power everything off bro then turn then back on again 🤣 )

Should have some time this weekend to see if anything can be done with what is already there but leaning towards a few software side tricks I have a bit of experience with; nothing solid yet though 😢