Better Battery Messages

dakejahl commented 2 years ago

An attempt to establish comprehensive messages for smart battery data. Your input is highly valued!

dakejahl commented 2 years ago

Maximum number of cells? Updated to 24 ~- Status flags vs enums?~ flags are better
Any other identifying information required outside of a name string, 4 byte serial, ~and 1 byte numeric ID?~

~I'm actually not sure how the numeric ID would be assigned in a multi-battery system... maybe we use node ID instead to correlate the Continuous/Periodic messages?~ Node ID is used to associate data on the receiving side, for PX4 uORB ID is also uin8_t.

auturgy commented 2 years ago

It would be useful to align DroneCAN and MAVLink messages, as they need to partially map (DroneCAN is obviously internal, MAVLink external, but where information is common the field names and data types should match in my view).

PetervdPerk-NXP commented 2 years ago

Good start overall and nice initiative

Still I've got some feedback you might want to consider in the smart battery messages.

Status flags

I would like to have a flag to indicate that the battery is bad and should not be used (i.e. when overcharged or over discharged)
A battery cold indication, might be important flag as well.

For the SmartBatteryPeriodic messages, I think it might worthwhile to explore to expose more information of the battery itself, things that came to my mind would be

Manufacturer date
Weight
Battery technology LiPo/LiFe/NiMH etc

dakejahl commented 2 years ago

Andrew B 2 hours ago Is require service based off of a bms fault detection flag ? Also there are multiple temperature source reading capabilities in bms, some die and some external thermistor. Adding these may or may not be worth the overhead though.

hamishwillee commented 2 years ago

Yes. See https://mavlink.io/en/messages/common.html#SMART_BATTERY_INFO and https://github.com/mavlink/rfcs/pull/19 (in discussion)

dakejahl commented 2 years ago

Agreed about additional flags for bad battery, under temp, and manufacture date. I disagree on weight and battery technology though. Both these fields are static and can be derived from the name field (manufacturer_model). That data never changes and can be found with a quick google search for the datasheet of the product.

dakejahl commented 2 years ago

Alright I made all data types and name match mavlink with the exception of status_flags, see mavlink rfc for battery_status_v2

I also moved time_remaining into the continuous message since this is definitely a useful high rate piece of data.

Thorfinn-Thorsson commented 2 years ago

I am loving this rework on the smart battery message info. If I can add my thoughts on the matter.

Firstly, I agree 100% transmitting static information at a rate equal to or grater than 1Hz is useless. If it isn't used in the normal operation of a vehicle then why send it at 1Hz? I am not saying don't send it, just don't flood the bus with info that is hardly ever used. I think Manufacturing date should be separated out into more generic information that other nodes use.

Secondly, I think we need to look at the necessity/criticality of data that is being sent. How often is it sampled, how often is it used/needed. If there isn't sound justification for high rate transmission then I think it should go in the 1Hz periodic message. In my mind, this is things like time_remaining, battery_remaining and maybe temperature. Lets look something like the BQ40Zxx. This BMS chip "perform voltage, current, and temperature readings every 250 ms" but will only "performs protection and gauging calculations, updates SBS data, and makes status decisions at 1-s intervals.". I read that as time_remaining and battery_remaining are only calculated once every second which means we are not gaining anything by sending them at a 10Hz rate.

I am happy to add to or generate another PR to show my point of view

dakejahl commented 2 years ago

If it isn't used in the normal operation of a vehicle then why send it at 1Hz? I am not saying don't send it, just don't flood the bus with info that is hardly ever used.

100% agree. Some kind of request/response mechanism would be much better suited for this static information. I have not used "Services" before but perhaps we need to introduce a metadata file for CAN Nodes. This would be a one time transaction initiated by the FC and the file could be a JSON description of the value name, data type, and value. This would eliminate the Periodic message entirely in its current form. Another option would be to have a Service request a message be sent once. This would effectively convert the Periodic message into a Static message and the FC would have to request it.

I think the transmission rate of data is less important, I picked 10Hz and 1Hz arbitrarily. But killing off the periodic message in favor of a metadata file and keeping the "remaining" fields in the Continuous message might be the best optimization. Another benefit of a JSON file would be extensibility without breaking the wire format. Thoughts?

dakejahl commented 2 years ago

My personal thought is that cell voltages should get their own message.

I think this would make sense

Hopefully, those could be turned on/off as parameters

I think having a mechanism to request messages, whether that be a single shot or interval, would be super useful.

You should take a look again at the original and copy more from the current implementations

Which fields would you like to see? And could you explain the usefulness? I reviewed all of the existing messages and omitted many of the fields because I thought they were redundant or do not provide much value.

hamishwillee commented 2 years ago

@dakejahl Just as an FYI:

The mavlink RFC is still draft - we know that the BATTERY_STATUS is not scalable, but v2 has only had the smallest amount of discussion.
- Upshot, the proposal may change due to MAVLink feedback, and it can change in response to yours too. Please be very direct to me to make sure that happens. I'm a little scattered, and I'd hate to miss a point made by you guys.
SMART_BATTERY_INFO has been around a while. It is not clear if it is used anywhere, but we have to assume it is.
- So, easy enough to extend as is static, but changing types etc is difficult.
- MAVLink has a mechanism for requesting messages. So we might update docs to suggest it is explicitly requested and may not contain dynamic information.
- EDITED One reason against, is that without streaming (if only at low rate) there is no way to detect when a battery has been changed/replaced - at least in respect to the invariant parameters.

Cell voltages are the reason we are breaking BATTERY_STATUS we keep on getting batteries with more cells and we can't extend the message.

The thing to be aware of is that they are not needed for the "main use case" - finding out the battery voltage.
Mostly you only need them when your battery starts to fail and you want to debug exactly what's happening - i.e. its something a human needs for debugging not something that the GCS will use. At that point I have been told it is useful to have all the cell voltages (not just per cell fault info or delta information) and you want it from logs for a period.
Upshot, is that I don't think these should be streamed by default - rather they should be something that you enable following a fault or a feeling that your battery is rubbish. In mavlink you would do this using MAV_CMD_SET_MESSAGE_INTERVAL.

I'll update the MAVLink RFC with some feedback from developers later today. Again, we might not take it. Message design is a compromise.

dakejahl commented 2 years ago

EDITED One reason against, is that without streaming (if only at low rate) there is no way to detect when a battery has been changed/replaced - at least in respect to the invariant parameters.

I agree this could be an issue. Perhaps we can have GCSs request this as well have FCs send at a very low rate. FCs could also implement a mechanism to trigger sending the message when a battery has changed, but architecturally this doesn't exist and would be more difficult to implement than simply low rate streaming (0.2Hz or something)

100% agree about selectively enabling cell voltages, I think they belong in their own message. The question is what is the cutoff for maximum number of cells? An inefficient way would be to send each cell voltage/fault_flag as its own message. As far as logging cell voltages, we could still do this within each FC log file.

dakejahl commented 2 years ago

James P — 03/30/2022 Any reason the persistent data couldn’t be stored and read as parameters?

I think this is a good suggestion although does require nodes to implement parameters. It is more flexible and extensible which would allow implementations to provide only the data they deem necessary. A downside of this would be a lack of a standard for parameter names... so whoever does it first and provides handling in one or more flight stacks would effectively lock the spec, unless we continued to extend the handlers for new/different param names... double edged sword me thinks. Thoughts?

hamishwillee commented 2 years ago

In mavlink we are proposing a 12-cell message with battery id and index. If you have 24 cells, you send it twice. You might argue inefficient, but the fact is no one needs this message except for debugging - it can be sent at very low rate.

I don't know how UAVCAN "treats" parameters but in MAVLink they are essentially component specific - i.e. MAVLink can set them and expose them to users, but it knows nothing about them. So if you say something is a parameter you are saying that it is not standardized and you can't rely on it being something you use unless you know the system.

That makes sense for a lot of stuff, but I don't imagine battery properties fit into that category.

hendjoshsr71 commented 2 years ago

Here is my branch built from your current branch, https://github.com/hendjoshsr71/DSDL/tree/pr/battery_messages

I would love to compare anything we do with a comparable messages used in auto EVs or commercial aircraft? But I;m not well versed on the standards there.

I created a Cell1 message for just voltages and resistances for those who would want to monitor health fully. Presuming the rate of the message would default off.

The biggest changes is pulling in more comments and fields to define the meaning. We had a case of a vendor defining charge current opposite normal convention once.

Assuming we can figure out a way to do a "static" message type we should be pulling in more info. to giving others systems a fuller picture of the battery from a charging and discharge perspective. IE slow down you cant pull that much current. And the system slows itself down (ArduPilot can do this already).

One area I think we see this differently is the message is labeled "smart_battery". But really do we need other battery messages for batteries or systems with varying measurement and estimation capabilities?

IE some use the current message on a can node to convert analog voltage and current measurements to CAN to the autopilot. These have zero estimation capabilities and limited measurements.

Most TI ICs would have middle of the road estimation capabilities. And above that would be those that go to the full modeling effort and create an EKF for SOC and SOH.

dakejahl commented 2 years ago

I created a Cell1 message for just voltages and resistances for those who would want to monitor health fully. Presuming the rate of the message would default off.

Great idea.

The biggest changes is pulling in more comments and fields to define the meaning. We had a case of a vendor defining charge current opposite normal convention once.

Yeah I am not opposed to putting all kinds of info in the static/periodic message. If we can create a Message Request Service that would be ideal, and then we just request the message when we want it.

One area I think we see this differently is the message is labeled "smart_battery". But really do we need other battery messages for batteries or systems with varying measurement and estimation capabilities?

Yeah you're right, both of these messages are generic enough to be used for all batteries. The important piece is just the separation of high and low rate data. Let's rename to be more generic

Most TI ICs would have middle of the road estimation capabilities. And above that would be those that go to the full modeling effort and create an EKF for SOC and SOH.

Yeah I like the addition of the two capabilities flags.

dakejahl commented 2 years ago

Suggestions for the new message names?

SmartBatteryContinuous --> BatteryStatus SmartBatteryPeriodic --> BatteryInformation SmartBatteryCells1 --> BatteryCellVoltages

Or is having "static" or "continuous" in the name helpful to understand the usage?

And perhaps we add an index field to the SmartBatteryCells1 instead of having to create new messages to extend it. Index 0 would be first 24 cells, index 1 is the next 24 cells, etc

hamishwillee commented 2 years ago

Suggestions for the new message names?

SmartBatteryContinuous --> BatteryStatus SmartBatteryPeriodic --> BatteryInformation SmartBatteryCells1 --> BatteryCellVoltages

Or is having "static" or "continuous" in the name helpful to understand the usage?

The new proposed names are much better.

And perhaps we add an index field to the SmartBatteryCells1 instead of having to create new messages to extend it. Index 0 would be first 24 cells, index 1 is the next 24 cells, etc

Yes.

In MAVLink we used an index for the "SmartBatteryCells1" equivalent message for that reason. There are already batteries with far more than 24 cells. We used just 12 cells on the assumption that it is a soft spot where the vast majority of batteries will have less than that. We were originally 10 but noted that battery cells seem to increase in multiples of 6.

hendjoshsr71 commented 2 years ago

Yup. Adding an index makes a lot of sense to simplify things. Thanks!

Does that mean we would might want to add the battery id to make life easier for log analysis? I know it could be derived from battery_info and the NodeID.

I like the new names.

Maybe "BatteryCell" or something more generic "BatteryCellStatus"

hamishwillee commented 2 years ago

Maybe "BatteryCell" or something more generic "BatteryCellStatus"

BatteryCellVoltages is more precise if all you are sending is voltages. I haven't looked at the message - you might be sending per-cell fault information too.

hendjoshsr71 commented 2 years ago

Maybe "BatteryCell" or something more generic "BatteryCellStatus"

BatteryCellVoltages is more precise if all you are sending is voltages. I haven't looked at the message - you might be sending per-cell fault information too.

I've added in cell resistances since that plus cell voltages equal most of the extended info. someone health monitoring might want for long term monitoring. (This assumes the rates of the message or field is configurable of course....)

hamishwillee commented 2 years ago

Maybe "BatteryCell" or something more generic "BatteryCellStatus"

BatteryCellVoltages is more precise if all you are sending is voltages. I haven't looked at the message - you might be sending per-cell fault information too.

I've added in cell resistances since that plus cell voltages equal most of the extended info. someone health monitoring might want for long term monitoring. (This assumes the rates of the message or field is configurable of course....)

BatteryCellStatus might be a good name then.

For MAVLink the way messages work means that, if resistances are useful, we would do these as a separate message.

hendjoshsr71 commented 2 years ago

Yeah in the context of mavlink I might just wait till someone asks for it... I don't think I'd ever put those on the link with the limited space available. Even in a flight test scenario with high speed links I wouldn't add them to the link budget.

I forgot about the context of companions too with mavlink. There receiving resistances might present a limited value.

dakejahl commented 2 years ago

Questions about measurement/estimation capabilities @hendjoshsr71 I think it would be cool to know but idk how this data would be conveyed to a user/system in a meaningful/actionable way. https://github.com/dronecan/DSDL/pull/7/commits/bbc02ecccc9560197d69ed1ff0648aa2f5bd41ee#diff-a3324e7bf8cb7d932b895d9ac41eaab64ae8a9287cecb7902737c7046ab24fbbR5-R16

hamishwillee commented 2 years ago

@dakejahl Anything of the last lot need to be reflected in MAVLink proposal? I don't think so.

dakejahl commented 2 years ago

Adding a message for extra thermistors seems like a waste of CAN frames? But maybe that is better? 2-3 thermistors doesn't seem that out of the ordinary.

@hamishwillee nope just small tweaks

@hendjoshsr71 Agreed. I could get behind two temperatures.

float16 temperature_cells
float16 temperature_fets

What would a third one be?

float16 temperature_pcb
# or
float16 temperature_shunt
# or
float16 temperature_other

hendjoshsr71 commented 2 years ago

I would probably be more generic on the second name

float16 temperature_cells    // Internal battery pack temperature, (between cells)
float16 temperature_pcb     // Battery PCB temperature (likely main FET or current sense resistor)
float16 temperature_other   // Secondary (external battery pack temperature or sencodar cell temperature)

I could see a reason for not including a third and just making the second one generic? Bigger systems are likely to have more than one cell thermistor per pack safety/redundancy?

echoGee commented 2 years ago

Adding a message for extra thermistors seems like a waste of CAN frames? But maybe that is better? 2-3 thermistors doesn't seem that out of the ordinary.

@hamishwillee nope just small tweaks

@hendjoshsr71 Agreed. I could get behind two temperatures.
float16 temperature_cells
float16 temperature_fets
What would a third one be?
float16 temperature_pcb
# or
float16 temperature_shunt
# or
float16 temperature_other

In terms of the utility of the data, this is recommended. Regardless of how many thermistors are in the pack, the useful information to know is the min/max of pack and max of the board.

float16 max_temperature_board // Max temperature on BMS FET/shunt/board
float16 max_pack_temperature  // Maximum Battery pack temperature
float16 min_pack_temperature  // Minimum  Battery pack temperature

hendjoshsr71 commented 2 years ago

This is reasonable and an ok compromise. I've added some comments

// When there is only one thermistor for pack temperature measurement, fill in only the max_pack_temperature 
// Minimum or maximum temperature could be reported from different thermistors during use 
// Example: TempA_pack = 30C,  TempB_pack = 25C, TempC_pack = 15C
// max_pack_temperature = TempA_pack; 
// min_pack_temperature   = TempC_pack 
float16 max_temperature_board // Maximum temperature on BMS FET/shunt/board, ???: field not provided
float16 max_pack_temperature   // Maximum battery pack temperature, ???: field not provided
float16 min_pack_temperature   // Minimum battery pack temperature, ???: field not provided

The downside is then a user won't know which thermistor the measurement is coming from. If one sensor fails or gets "weird" ie never increases or decreases you can't know where this is occurring. But I guess the system I have in mind for this could just add a different message.

dakejahl commented 2 years ago

Regardless of how many thermistors are in the pack, the useful information to know is the min/max of pack and max of the board.

I disagree, because as @hendjoshsr71 states

The downside is then a user won't know which thermistor the measurement is coming from. If one sensor fails or gets "weird" ie never increases or decreases you can't know where this is occurring

Also it's really the FETs and the cells that we're worried about. Random min/max values don't tell you much other than "there is a problem". If your implementation uses multiple cell thermistors, then it is the responsibility of the implementation to report the most important value from those thermistors (highest or lowest if out of spec). The data needs to be actionable and useful, not require further debugging -- hence specifying which temp is associated with which component so we know where the problem lies.

echoGee commented 2 years ago

Regardless of how many thermistors are in the pack, the useful information to know is the min/max of pack and max of the board.

I disagree, because as @hendjoshsr71 states

The downside is then a user won't know which thermistor the measurement is coming from. If one sensor fails or gets "weird" ie never increases or decreases you can't know where this is occurring

Also it's really the FETs and the cells that we're worried about. Random min/max values don't tell you much other than "there is a problem". If your implementation uses multiple cell thermistors, then it is the responsibility of the implementation to report the most important value from those thermistors (highest or lowest if out of spec). The data needs to be actionable and useful, not require further debugging -- hence specifying which temp is associated with which component so we know where the problem lies.

Not sure if I used the wrong term. I meant max/min temperature of the cell, and the max temp of the FET. If you have one thermistor that isn't working, there is still information that maybe usable. But I dont think this protocol is designed for multiple failure conditions etc?

float16 max_temperature_board // Max temperature on BMS FET/shunt/board
float16 max_cell_temperature  // Maximum cell temperature
float16 min_cell_temperature  // Minimum cell temperature

AlexKlimaj commented 2 years ago

I think having min and max cell temp is overkill. maybe fet temperature and cell temperature. PCB temperature is technically there in my designs but I don't know how valuable it is.

dakejahl commented 2 years ago

Okay 3 temperatures now... not sure how useful the third is but at the very least if @echoGee wants pcb/min/max there are 3 fields and you could repurpose them. Cells/fets/shunt are probably the most important... and my preference is still to have concrete definitions, but this should be a good compromise.

float16 temperature_cells       # [C]      : Pack mounted thermistor (preferably installed between cells), NAN: field not provided
float16 temperature_pcb         # [C]      : Battery PCB temperature (likely output FET(s) or current sense resistor), NAN: field not provided
float16 temperature_other       # [C]      : Application dependent, NAN: field not provided

hendjoshsr71 commented 2 years ago

I'm good with the above Jake.

One reason why I think the "other" field is useful is for cold applications where you might have an external heater & thermistor. And that temperature could vary quite a bit in my mind from the internal temperature.

dakejahl commented 2 years ago

I am looking at it now and thinking we really don't need capacity_remaining since we would know full and consumed, as long as we redefined consumed to mean total consumed for the pack. This would be coloumb counting with an analog power monitor (consumed since boot) and it would be the fuel gauge for smart batteries (consumed since full). Thoughts? https://github.com/dronecan/DSDL/pull/7/files#diff-a276641f2405349cc2307f345b18ec6e35498c405f3849842f22840603a90a4aR11

hamishwillee commented 2 years ago

I am looking at it now and thinking we really don't need capacity_remaining since we would know full and consumed, as long as we redefined consumed to mean total consumed for the pack. This would be coloumb counting with an analog power monitor (consumed since boot) and it would be the fuel gauge for smart batteries (consumed since full). Thoughts?

I am assuming that under the hood a smart battery just does what a power module does? i.e.

it knows the initial charge and does a dynamic count of consumption to "nominal empty"?
it tracks consumption and charging to work out the expected charge remaining
Perhaps it can know when it is fully charged and therefore reset the counter in point above when actually fully charged.
The difference with a power module being that the "consumed count" is maintained through drone power cycle.

Is that right?

From a MAVLink perspective

End users are far more interested in capacity and percentage remaining than anything else. If we can supply that, it is the "interesting value" that GPS doesn't have to do anything else to process. Users are I guess a little interested in the amount consumed, but only to work out how much is remaining and how long a charge might take.
What would "redefined consumed to mean total consumed" mean for a power module?

If I was to remove this from MAVLink users would require two messages in order to estimate the remaining charge. That doesn't make sense for us - but it might might for UAVCAN.

PS MAVLink side of this discussion here: https://github.com/mavlink/rfcs/pull/19/files#r887389280 - I think we should keep both values and add a flag MAV_BATTERY_STATUS_FLAGS_CAPACITY_RELATIVE_TO_FULL to indicate what the consumed/remaining values are relative to (this is probably more clear than saying "this is a smart battery" or "this value can't be trusted").

dakejahl commented 2 years ago

A power module can only know how much capacity has been consumed since boot. It has no way of knowing the initial state of charge (other than making assumptions based on voltage, but these assumptions result in bad estimates since age / nominal voltage / chemistry are important)

End users are far more interested in capacity and percentage remaining than anything else

Agreed, but fundamentally a power module cannot tell you capacity remaining nor percentage remaining since it can't estimate state of charge. So when using a power module only "capacity_consumed" can be trusted to be correct. The percentage remaining could be supplied by estimating the initial state of charge from the voltage (where nominal voltage is a parameter on the autopilot), but "capacity_remaining" would/should always be left empty.

Does this make sense?

What would "redefined consumed to mean total consumed" mean for a power module?

What I am saying is the definition would change depending on whether it's a smart battery or a power module. For a power module "capacity_consumed" would be the capacity consumed since boot. For a smart battery it would be the capacity consumed since the full charge capacity

hamishwillee commented 2 years ago

@dakejahl See my "PS" to the last message/comments in MAVLink, that is what I ended up with in idea of "MAV_BATTERY_STATUS_FLAGS_CAPACITY_RELATIVE_TO_FULL "

I was hoping you'd actually answer my query about how smart batteries work starting "I am assuming that under the hood a smart battery just ". I know how a power monitor works :-)

dakejahl commented 2 years ago

@dakejahl See my "PS" to the last message/comments in MAVLink, that is what I ended up with in idea of "MAV_BATTERY_STATUS_FLAGS_CAPACITY_RELATIVE_TO_FULL "

Yup I saw it, I like it.

I was hoping you'd actually answer my query about how smart batteries work

Smart batteries consist of a fuel gauge and a protections system. Sometimes these are single ASIC solutions, other times they are multiple ASIC.

The protections part of the system is responsible for controlling the input/output FETs to ensure the pack does not exceed the voltage/current/temperature limits for the pack/cells (these are the "faults"). It also can track faults over the lifetime of the battery so you can see if the battery has ever gone thru some shit :slightly_smiling_face:

The fuel gauge is responsible for keeping track of the state of charge, state of health, and cycle count. It is able to do this because it goes through a calibration in the factory with the battery packs. This calibration consists of charge/discharge cycles and measuring the impedance of the cells. TI calls their algorithm in these chips that does this "Impedance Track"

state of charge : this is the capacity_remaining. The fuel gauge estimates this and keeps track of this, it never "forgets", even if the battery is powered off.
state of health : this is the full_charge_capacity. The fuel gauge estimates this and keeps track of this over time. As a battery ages the full_charge_capacity will reduce.
cycle_count : this is also provided by the chip, since there was a calibration performed the chip knows what a cycle is.

schwacff commented 2 years ago

Hey guys! My name is Cory Schwarzmiller - I'm an embedded systems engineer with Freefly Systems. We were one of the core contributors in the work group to develop the Pixhawk battery hardware standard. We are finalizing this here: https://docs.google.com/document/d/1xYBhhgjND_RS2W3Hq0Seyc-R661ge3zonk9DTIyHclc/

We have developed a battery system for our Astro drone that is compliant with the Pixhawk hardware spec and are currently using our own CAN comms standard called FF_CAN, which we have posted publicly here: https://docs.google.com/spreadsheets/d/15j90n-wyhvrg4vQjjYPZCsuBW2G2DhDCZ2K_Fx0Wgw8/

Originally we were considering using UAVCAN but couldn't figure out at the time (around 2 years ago) which version we should use and thus decided to get started with our own simple protocol. That said, we are excited to see progress being made on Dronecan and would like to add support in our batteries eventually. I have been chatting with people on the Auterion team and Jacob Dahl about this and am excited to help out and make sure these Dronecan messages will work with our batteries and are well designed to support future ones. Thanks for bringing us in!

Generally we would like to make sure that the info in our current CAN implementation (see the spreadsheet above) is covered in the Dronecan implementation.

I have a few notes that I would like to add:

The Pixhawk spec requires that the batteries be commanded to enter a mode called "airmode" prior to take off that bypasses the BMS's protections. We will need a dronecan message for this. I will propose one.
The Pixhawk spec also uses the concept of a slot ID. This is the physical location of the battery in the aircraft. I think this is already in the 20004 message. Is there a good way then for PX4 to link the Dronecan node ID with the slot ID?
We should put state of charge in the continuous message. I understand it may be represented by ratios of the capacity of the other values, but it is really the number one most important piece of data from the battery. Even if it is only to help people who know less about the battery then I think it is worth it.
We consider cell voltages to be very important, mainly for analysis after flights. In the process of testing batteries it can be important to see if you have cells that are coming unbalanced under load, etc. I strongly advocate periodically updating these. 1Hz sounds like a good rate.
Cell Temp is the most critical, followed by BMS temp. On the BMS, FETs are most important, the shunts and IC internal temps are not as important. We should have a provision for more cell temp sensors too. I would advocate for at least one extra, or possibly one additional extra as well, for a total of four. I would say at least one should be in the cells, and one should be on the BMS near the FETs. The additional two could be mfg specific?
We have discrete faults for over temp and under temp in discharge and charge, and specific for cells and BMS. We really prioritize detailed fault info and find it very helpful.
We currently use 25hz rate for updating critical data. I think having variable rate parameters would be very helpful for these messages.
Is there a way to make Dronecan send only the used number of cell voltages? I agree that 24 is a good max number, but most smart drone batteries are currently 6S or 12S.
I would like to see two current options: one integrated current from the coulomb counter, and one from an instantaneous reading, if the pack supports it. I think most would use the coulomb counter one by default.
We should add lifetime data for temps, currents and pack/cell voltages. I'll propose a suggestion.

Looking forward to hearing thoughts. I will discuss specifics with Jacob directly soon as well.

AlexKlimaj commented 2 years ago

I don't think you need a message from the vehicle to tell the battery when to disable protections. The battery should be able to determine that on its own.
The CAN ID and slot ID would need to be correlated on the vehicle side. Maybe by enumerating each slot one at a time somehow to discover which CAN ID is in which slot.
I think having full charge capacity and remaining capacity is fine. That is already what the fuel gauge is using to measure % remaining.
For cell voltages, you can determine the rate to send the BatteryCells message.
I think two temperature readings for a production system is enough information. Four different temperatures reported from one battery is overkill.
All of these messages are sent by the battery and the battery firmware can determine what rate to send each message at.
Dronecan only send the number of populated items in an array. So 6s will only send 6 items in the cell voltage array.
The capacity_consumed already gives you the integrated coulomb counter data you are wanting.

dakejahl commented 2 years ago

Cell Temp is the most critical, followed by BMS temp. On the BMS, FETs are most important, the shunts and IC internal temps are not as important. We should have a provision for more cell temp sensors too. I would advocate for at least one extra, or possibly one additional extra as well, for a total of four. I would say at least one should be in the cells, and one should be on the BMS near the FETs. The additional two could be mfg specific?

What if we used an array of temperatures? As Alex mentioned DroneCAN only sends as much data is populated in an array, we could define an ordering for the array based on importance (temps[0] = cell_temp, temps[1] = fet_temp, etc). This could allow a lot of temperatures to be sent if needed, with NAN used as a placeholder for unused indices. This would also make @echogee happy

float16[<=10] temperatures     # ordering as follows: cell_temp, fet_temp, internal_temp, max_temp, min_temp, etc,

Also now that I am thinking about it harder, does a 16 bytes serial number really make sense? I think a uint32 would probably be fine.

uint8[<=16] serial_number              # Serial number in ASCII characters, 0 terminated

Same thing for manufacturer date, we could encode it using 4 bytes like this Day + Month×32 + (Year–1980)×512

uint8[<=11] manufacture_date           # Manufacture date (DD/MM/YYYY) in ASCII characters, 0 terminated

dakejahl commented 2 years ago

The Pixhawk spec requires that the batteries be commanded to enter a mode called "airmode" prior to take off that bypasses the BMS's protections. We will need a dronecan message for this. I will propose one.

I don't think you need a message from the vehicle to tell the battery when to disable protections. The battery should be able to determine that on its own.

We could do both.

The Pixhawk spec also uses the concept of a slot ID. This is the physical location of the battery in the aircraft. I think this is already in the 20004 message. Is there a good way then for PX4 to link the Dronecan node ID with the slot ID?

I think the "slot" concept is a bit specific to FreeFly's design but I think it can be dual use for the Node ID. Since your implementation cares about where a battery is physically connected and you're using a signal to detect where a battery is located, you could have the battery assign itself a Node ID based on which slot it finds itself plugged into.

We should put state of charge in the continuous message. I understand it may be represented by ratios of the capacity of the other values, but it is really the number one most important piece of data from the battery.

The reason it was dropped is because you can derive SOC from the ratio of the other two values and it gives you better resolution. I don't see why we would want a redundant SOC as %?

We consider cell voltages to be very important, mainly for analysis after flights. In the process of testing batteries it can be important to see if you have cells that are coming unbalanced under load, etc. I strongly advocate periodically updating these. 1Hz sounds like a good rate

Details like this are left to the implementation. So if you want to publish BatteryCells at 1Hz then do it!

We have discrete faults for over temp and under temp in discharge and charge, and specific for cells and BMS. We really prioritize detailed fault info and find it very helpful.

In the Mavlink RFC we briefly discussed having a 32 bit extension for "custom" faults, we could add that here if people really wanted it. I am trying to avoid stuffing a bunch of debug fields into the "standard message" though, since after a system has gone through bringup these fields end up being dead weight. Also we want to draw a line somewhere between "information a user should know and can act on" and "information only the manufacturer understands". I think detailed fault information falls into the latter.

We should add lifetime data for temps, currents and pack/cell voltages. I'll propose a suggestion.

I think this should be in another message though.

schwacff commented 2 years ago

So regarding the disabling of protections, I agree that it would be nice to make the arming interlock optional for batteries that don't implement this. To support the interlock in mainline PX4 we will need to implement this. @cmic0 may be able to help with this? I think we can make it parameter enabled?

I think we can use the "battery_id" field within 20004.BatteryInfoAux.uavcan to be the "slot id".

Sounds good on the rates - we can control that on the battery side. Is there a method in Dronecan to allow the aircraft to request a new rate? That would be nice to have.

I'm onboard with having an array of additional temperature values. I agree that maybe users won't care, but it really helps for development. So with Dronecan a node can choose to not send a whole array length? Or does the array have to be at the end of a message in order for that to work? That works well for temperature if so!

Regarding the two current measurements - nevermind - what I was talking about was a bit different than capacity consumed but we have decided not to pursue it.

For serial number we use a uint32. Good with that!

For date we do uint8 for day, month, and year -2000. That is not the most compact form, but very easy to read.

I agree with not using the term Airmode anymore. Maybe we could change this to "Flight Mode"? Unprotected could work too maybe. But that means less to non battery people.

I just think SoC is so critical that it should have its own value. Are there perhaps some types of batteries may just use a super crude voltage to SoC representation? I suppose they wouldn't have a smart BMS then... I'm negotiable on this I suppose.

Detailed fault info is helpful to everyone, even end users, far after development. In the event that an aircraft has a problem it is extremely helpful to have data to look at to diagnose the problem. We can put our further details into a more specific message, but we should have some pretty granular ones at a minimum to encourage battery manufacturers to provide really helpful fault data.

schwacff commented 2 years ago

I do agree with splitting some of this stuff up into more messages too. That way rates can be different for different data and stuff. That is really one of the huge benefits of CAN! It's designed for granular bits of data.

schwacff commented 2 years ago

Also, I'm thinking about the fault / status flags. It might be helpful to break these out into two 32 bit flags?

There are a few that I think we need better definitions on too. I don't really understand what these mean? STATUS_FLAG_READY_TO_USE - no faults or potential faults? STATUS_FLAG_REQUIRES_SERVICE - needs it's oil changed!? STATUS_FLAG_BAD_BATTERY - is this a lifetime fail flag is set or something?

In place of STATUS_FLAG_FAULT_PROTECTION_SYSTEM I think we should have the charge / discharge allowed flags, which would basically just be the FET states.

We should add: STATUS_FLAG_DISCHARGING - partner flag to charging flag. current is above a meaningful threshold STATUS_FLAG_PROTECTIONS_ENABLED - this could indicate flight mode. Or maybe we say that more directly? STATUS_FLAG_MOUNTED_IN_SLOT - Determines if battery is confirmed to be mounted in a slot (ID pin is pulled down). The slot ID is read in the BatteryInfoAux message. STATUS_FLAG_FAULT_PCB_ERROR - The BMS PCB has an issue (comms, etc) STATUS_FLAG_DISCHARGE_ALLOWED - discharge FET turned off STATUS_FLAG_CHARGE_ALLOWED - charge FET turned off STATUS_FLAG_HOTSWAP_ALLOWED

hamishwillee commented 2 years ago

@schwacff @dakejahl Just my "two bits"

I think the "slot" concept is a bit specific to FreeFly's design but I think it can be dual use for the Node ID.

The concept of slot has come up a number of times. I'd be interested to understand where it adds value.

I'm more interested in discussions around making sure that we have an approach that provides enough information for an Autopilot to successfully report on multi-battery systems and set sensible failsafe levels. At the moment all the thinking is around single battery systems.

Also, I'm thinking about the fault / status flags. It might be helpful to break these out into two 32 bit flags?

Has been discussed and joined/merged a couple of times. The main reason currently for the joined up state is efficiency and extensibility. We have nowhere near 64 bits of flags in either faults or statuses. If we add them they won't be generic.

STATUS_FLAG_READY_TO_USE - no faults or potential faults?

This has been changed MAVLink side to STATUS_FLAG_"NOT"READY_TO_USE and should be changed here. It is an aggregate of the faults and other flags that mean the vehicle should not fly, as decided by the integrator. In other words, you'd use this as a check to indicate there is no reason not to take off.

STATUS_FLAG_REQUIRES_SERVICE - needs it's oil changed!?

Yes :-) This is for repairable batteries that likely need maintenance. The idea being that you might set this following a schedule of too many charge/recharge cycles to get it properly checked.

STATUS_FLAG_BAD_BATTERY - is this a lifetime fail flag is set or something?

The assumption was that the battery is broken and unrepairable, so you might send it to a recycler but don't call the manufacturer expecting a fix. I thought this sensible, but perhaps it is not.

In place of STATUS_FLAG_FAULT_PROTECTION_SYSTEM I think we should have the charge / discharge allowed flags, which would basically just be the FET states.

See discussion in MAVLink thread re this being like a state that indicates whether battery or autopilot is in charge of failsafes. I like that more "if it works" because it is more informative to a consumer, that has no control over whether the battery is in this mode or not. Either way, this would contribute to the not ready to use flag.

We should add:

STATUS_FLAG_DISCHARGING - partner flag to charging flag. current is above a meaningful threshold

You can tell this from the current, but I have no particular problem with that if others agree.

STATUS_FLAG_PROTECTIONS_ENABLED - this could indicate flight mode. Or maybe we say that more directly?

As discussed, we'll map here and mavlink for most things. Under discussion because we might be more generic.

STATUS_FLAG_MOUNTED_IN_SLOT - Determines if battery is confirmed to be mounted in a slot (ID pin is pulled down). The slot ID is read in the BatteryInfoAux message.

Slots need more discussion and this seems very specific. Though perhaps it is reasonable to remove the concept of "specific" slot - because you might reasonably have something to tell you that a battery is properly physically connected. @dakejahl ?

STATUS_FLAG_FAULT_PCB_ERROR - The BMS PCB has an issue (comms, etc)

IMO sure. No more weird than the ones for incompatible firmware.

STATUS_FLAG_DISCHARGE_ALLOWED - discharge FET turned off STATUS_FLAG_CHARGE_ALLOWED - charge FET turned off STATUS_FLAG_HOTSWAP_ALLOWED

Interested in @dakejahl comments. I like that they are very specific interlocks.

hendjoshsr71 commented 2 years ago

uint8[<=16] serial_number # Serial number in ASCII characters, 0 terminated
Same thing for manufacturer date, we could encode it using 4 bytes like this `Day + Month×32 + (Year–1980)×512`
uint8[<=11] manufacture_date # Manufacture date (DD/MM/YYYY) in ASCII characters, 0 terminated

These are ostensibly extremely low rate messages..... so savings here "should" be minimal?

Isn't SMBus original spec. 16 Bytes? I know of a major manufacturer bringing up dronecan currently that is using the model name field for a serial number due to the original serial number only being uint32....... This field is being filled with a unique 16 Byte ID from the factory. They say these 16 bytes come from the BMS chip. Not having the full schematics I can't 100% verify the requirement myself.

On the date I'm good with extra computation and smaller size.

cmic0 commented 2 years ago

I don't think you need a message from the vehicle to tell the battery when to disable protections. The battery should be able to determine that on its own.

Just to add more color to this: the autopilot has more "situational awareness" of what the current state of the vehicle is and thus can better judge if disabling the current protection on the batteries is safe. The battery can only judge if needs to disable the protection only based on the current drawn, which in case of a malfunction of the system could not be the best thing to do. However, i do agree that this is an extra layer that aerial systems could eventually require so having this provisioned in the protocol but not mandatory is a good tradeoff :-).

So regarding the disabling of protections, I agree that it would be nice to make the arming interlock optional for batteries that don't implement this. To support the interlock in mainline PX4 we will need to implement this. @cmic0 may be able to help with this? I think we can make it parameter enabled?

yes, in PX4 the interlock mechanism is anyway fully transparent: we currently have a time window after arming for which the failure detector is checking for any failure coming from the battery, if a failure occurs (e.g battery failed to transition) then the battery would just throw an error and failure detector would catch it.

hendjoshsr71 commented 2 years ago

@dakejahl you may want to check the DroneCAN discord for some info. from @bugobliterator regarding tail byte optimization.

Reposted the summary here for convenience. My question to Sid: "For a new message such as this

uint8 index                       # Cell block index, index 0 corresponds to cells 0 - 23, index 1 cells 24 - 48, etc
float16[<=24] voltages            # [Volt]

If only the first six cells are populated are the rest truncated? (via something similar to mavlink's tail byte optimization)

I vaguely remember you saying no during your CANFD talk? Unlike in previous dronecan without CANFD."

From Sid: "better to not put resizeable arrays at the end, we want to move away from TAO flip the above message so index is at the bottom we automatically take care of turning off TAO for CANFD, but are yet to implement how to do it on Standard CAN its possible TAO messages will not be extensible, so new messages better to not have resizeable arrays at the end"

it should be like this:

float16[<=24] voltages            # [Volt]
uint8 index                       # Cell block index, index 0 corresponds to cells 0 - 23, index 1 cells 24 - 48, etc

"basically if an resizeable array is at the end, we skip sending length of the array, its inferred using the message length.

It becomes as issue, if say we add extension to the message, in that case we can't infer it using message length anymore

so, its possible that we won't be able to extend messages that have resizeable arrays at the end."

dronecan / DSDL

Better Battery Messages #7