Closed schrodingersket closed 3 years ago
My understanding was to get up to 40 devices directly connected to the gateway. If we are moving down to 30 . I don’t see the benefits. Also because there is no feature inside the gateway to force proviennent on routers before itself
@pipiche38 This pull request adds support for a different build configuration, which supports 30 devices connected directly to the gateway and up to 200 devices total allowed in the network when ZigBee routers are included in the network. It does this by utilizing the host to manage EEPROM, and it is adjacent to the default configuration which utilizes internal EEPROM; this build configuration is not meant to replace the current configuration, as that would require users to re-commission their network anyway to use the new memory management capability. This pull request simply adds support for a different build type to support networks which are comprised primarily of routers.
I can try to push the number of devices that can join directly to the coordinator in this configuration to 40, but the RAM limitations are currently a little tight for that. Pushing the total node count to 200 already took some finagling.
Juste a comment. You have updated the PDM_RECORD_LENGTH from 256 to 244. Any reason for that ? As @fairecasoimeme and myself worked indeed for the 256 .
Sure - that was to keep the entire message length (with all fields - record id, block id, etc.) under the MAX_BUFFER_SIZE
which sets the maximum serial message length. I may be understanding incorrectly, but I'd expect that if the message length is too high, bytes will be lost when saving/loading messages.
Could you also clarify the change on the PDMsave "Removed PDMSaveMessage confirmation loop in order to bring per". What do you mean ? What did you change ? Is that impacting the logic ?
The logic isn't impacted at all here - the current behavior is to wait for a message of type 0x8200
for each and every record and block that is to be saved. This caused a pretty significant performance hit and significantly increased the amount of data being sent over the serial line for very little gain. This pull request removes the requirement that a 0x8200
message is received before sending the next block to the host to be saved.
As the PDM_RECORD_LENGTH my understanding was that it was increased with the MAX_BUFFER_SIZE and so we optimized also the Save/Load part by reducing the number of iteration in the loops.`
For the 0x8200, that mean the host has to remove that. However I would suggest to keep that one when the last block is received and so the Record is completed. This will be consistent with the LoadPDM part
As the PDM_RECORD_LENGTH my understanding was that it was increased with the MAX_BUFFER_SIZE and so we optimized also the Save/Load part by reducing the number of iteration in the loops.`
Absolutely! PDM_RECORD_LENGTH
should definitely be as high as possible for exactly the reason you mention - in fact, the original source code from NXP that I used to write pdm_host.c
used 128
as the record length, so we're already increasing from that value. I think that 246
is the limit though (updating this PR for 246
instead of 244
; reasoning is below), as more than that causes the message body to exceed MAX_BUFFER_SIZE
, which is 256
. Here's how I got the number:
Bytes | Description | Size [bytes] |
---|---|---|
0-1 | Record ID | 2 |
2-3 | Total Size | 2 |
4-5 | Total Number of Blocks | 2 |
6-7 | Current Block | 2 |
8-9 | Block Size | 2 |
10-256 | Record ID | 0-246 |
I think I dropped a 2 in my initial math - so PDM_RECORD_LENGTH
should actually be 246
instead of the current 244
. Let me know if you agree with this logic. If not, please help me understand what I'm missing (:
For the 0x8200, that mean the host has to remove that. However I would suggest to keep that one when the last block is received and so the Record is completed. This will be consistent with the LoadPDM part
That's fine by me. The main performance hit was in waiting for acknowledgement on each individual block, so an acknowledgement on the full record set shouldn't impact performance very much.
On the 0x8200 I think that there is a potential issue. I need to look deeper, but for me the firmware need to request each block otherwise, there is a risk of flooding the HW .
Can you elaborate on the issue you're considering?
EDIT: Also, my Win10 virtual machine is currently bricked, so I'm going to be a little slow adding updates to this pull request until that's resolved.
EDIT 2: No longer bricked. I can actually be useful now!
I'm now remembering why 244
instead of 246
- the latter doesn't work when loading the PDM back in from a warm start.
On the 0x8200, I might have made a mix between Load and Save. So indeed, I agree with you the 0x8200 from the Host to the ZiGate is not needed, and will indeed save some time. On the other side, there is no flow control at the HW/SW level to manage the Serial line, and if we remove the 0x8200, I suspect that after having sent the first 0x0200 save recordId/BlockId, we will get the next one and so on. If on the host, we are not processing at the right speed, there is a risk of saturation, and lost of data. Last if we have a CRC error on the host side while receiving a block, we won't have any way to tell th ZiGate that this block must be re-transmited.
Short question @schrodingersket. Is your host based on python lib ? are you on a native C/C++ code ? What is the speed you are using for the serial line 115200 ?
cc: @fairecasoimeme
On the other side, there is no flow control at the HW/SW level to manage the Serial line, and if we remove the 0x8200, I suspect that after having sent the first 0x0200 save recordId/BlockId, we will get the next one and so on. If on the host, we are not processing at the right speed, there is a risk of saturation, and lost of data. Last if we have a CRC error on the host side while receiving a block, we won't have any way to tell th ZiGate that this block must be re-transmited.
You are spot on with this assessment; if the host isn't able to keep up with the incoming PDM save messages, they are lost. That said, once the coordinator is up and running, the only tables that really change are the device tables, which are saved as full records fairly frequently. Thus, a missed record will be re-transmitted upon the next save. In practice, this seems to happen for normal networks every ~10-20s from my experience. So missing blocks seems to be more of a theoretical problem than it is a practical problem.
Short question @schrodingersket. Is your host based on python lib ? are you on a native C/C++ code ? What is the speed you are using for the serial line 115200 ?
I'm using pyserial
at 115200 baud to keep compatibility with ZiGate defaults; speed hasn't been an issue at all so far. I've seen a few bytes get lost here and there, but it doesn't seem to impact overall operation for the reasons listed above.
Hi @schrodingersket and thanks Could you break this into smaller chunks so that @fairecasoimeme and I could review and modify if/when needed ? start by creating a PR for adding the new device only as it is the most critical (and simple) part
Moreover, 1/ did you create the new files yourselves or they come from NXP? you put yourself as author but kept the copyright from NXP. 2/ Is the new device sending a new zigate api message? if not, and it is only attributes report/read/write, why do we need new device files, the firmware already deals with this regardless of the device type. thx
Hi @schrodingersket and thanks Could you break this into smaller chunks so that @fairecasoimeme and I could review and modify if/when needed ? start by creating a PR for adding the new device only as it is the most critical (and simple) part
I assume you are referring to the Develco VOC cluster support here - is that correct? If so, I opened a pull request for that a while back, but branched off of that for this work. Each commit in this pull request should consist of a small, functional chunk of work that's easier to review.
Moreover, 1/ did you create the new files yourselves or they come from NXP? you put yourself as author but kept the copyright from NXP.
It's a bit of both. I reached out to NXP for support on building this out initially; I started with the file from the JN-AN-1223 appnote, which contained support for Host PDM management for the old APIs, but modified that quite heavily to support the new PDM APIs utilized in JN-AN-1216. I wanted to update the author so that there wouldn't be confusion as to who to reach out to for questions that arise in that implementation.
2/ Is the new device sending a new zigate api message? if not, and it is only attributes report/read/write, why do we need new device files, the firmware already deals with this regardless of the device type. thx
I don't think I added any new devices to this pull request. Which one(s) are you referring to? I did update the clusters on the coordinator, but again, that's due to this pull request being a branch off of another feature branch which added support for a couple of manufacturer-specific air quality clusters.
@badzz @pipiche38 - it looks like device-specific clusters are dropped in favor of propagating the 0x8002
message; since that's the case, will this code change be accepted if this pull request closed and re-opened with just the large network/Host PDM modifications?
Closing in favor of implementing in https://github.com/nimbus9inc/ZiGate.
Tested with 200 physical devices. Configuration allows for commissioning of up to 30 end devices with no router as well.
I did have to disable registration of endpoints other than
ZLO
in order to keep the RAM usage within the limits of the 5168 chip; I left that change out of this pull request in anticipation of that being resolved by PR #315 and Issue #310 .