OpenEtherCATsociety / SOES

Simple Open Source EtherCAT Slave
Other
578 stars 249 forks source link

FoE with LAN9252 #65

Closed CoolNamesAllTaken closed 4 years ago

CoolNamesAllTaken commented 5 years ago

Hi! First of all, thank you @nakarlsson and associates for your fantastic work on SOES. I really appreciate that such industrial software can be made open-source and easily understandable.

I'm working on a bootstrap implementation using the Microchip LAN9252 and SOES running on an STM32F4. I have replaced the SPI interface functions in the lan9252 driver that was intended for Linux, and have also rewritten the bootstrap code from the rtl slave example to work with the STM32's internal flash memory. I am using the EEPROM configuration file provided for the LAN9252 by the linux lan9252 example, since the stock EEPROM file on the LAN9252-SPI evaluation board does not have the correct PDI settings etc, and has FoE disabled.

Currently, I am able to trigger the LAN9252 and SOES stack to enter into bootloader mode by running the firm_update example on SOEM. After the firm_update program sends its first FoE request, the slave prints the following:

soes: state 2
soes: state 1
soes: init_to_boot_hook (part of my bootstrap code)
soes: state 3
soes: boot_started: 14347, boot_watch_dog: 214347
soes: erased all firmware sectors (this is part of my bootstrap code)
soes: firmware flash sectors erased (this is part of my bootstrap code)
soes: FOE_OP_WRQ
soes: FOE_init
soes: FOE_write
soes: FOE_send_ack

It seems that the slave enters bootloader mode and tries to send an ack to SOEM, but on the SOEM side, the ecx_FPRD on line 1038 of ethercatmain.c returns a mailbox error (MBXEp->Detail = 1044). This causes the FoE transfer to fail without sending any of the actual firmware data that I am trying to upload.

I've been stuck on this bug for a while, and am wondering what probable causes I should look into. Any pointers or advice would be greatly appreciated!

nakarlsson commented 5 years ago

Post a wireshark of the transfer

CoolNamesAllTaken commented 5 years ago

Wireshark capture attached as a .zip.

The firmware update was done by jumping through the firm_update example with a few breakpoints that I set manually. When I run the firm_update example at full speed, the master and slave can become misaligned, in that the slave doesn't change states properly when it is commanded to, possibly due to some commands being lost/overwritten. My guess is that this is due to my low SPI clock speed (I am running the SPI connection between STM32 and LAN9252 with jumper wires, so SPI clock is set to ~256kHz). By running the firm_update example with breakpoints, I am able to get the slave to change state correctly.

firm_update.zip

nakarlsson commented 5 years ago

Not often I see errors on the mailbox layer, have you run the CTT? Eg. to make sure your SOES port function as expected?

CoolNamesAllTaken commented 5 years ago

Sorry, I am not familiar with that terminology. What is the CTT and how do I run it?

nakarlsson commented 5 years ago

EtherCAT Conformance Test Tool, is purchased from Beckhoff and require a Vendor ID. The best way to test if a slave works properly. Since you’ve your own port it is good to validate that port.

I took a look on the Wireshark and the MBX respons looks invalid, start by validating the FoE send WRQ ack, make sure the sent response in wireshark is actually what SOES send, byte for byte dump via debug code. Since the error occur right away it should be easy.

CoolNamesAllTaken commented 5 years ago

Okay, I'm working my way through the code now. On the start of the transaction, in esc_foe.c, the OP_WRQ case calls FOE_Write(), which calls FOE_send_ack(). This is supposed to send back a mailbox message with FoE OpMode: ACK. However, the message that is received back has FoE OpMode: WRQ.

Is this the same issue that you are seeing? I only see two messages that pass the "ecat_mailbox" filter in wireshark, and I'm not sure if the second one is the response (which should contain OpMode: ACK) or an echo of the original message from SOEM (which contained OpMode: WRQ).

nakarlsson commented 5 years ago

I only see the request, the response isn’t interpreted as a FoE frame. So, you need to find out where the data get corrupted. Note: the WRQ is seen in two wireshark frames, one going out Wkc == 0, one coming back Wkc == 1. Both beeing the same WRQ, wkc == 1 indicate it was written ok to the slave.

Filter address offset 0x1000 for Master->Slave, 0x1080 for Slave->Master.

CoolNamesAllTaken commented 5 years ago

Thanks for the explanation! I am going to dig a little deeper to see if the mailbox write functions are working correctly. It sounds like SOES is attempting to send a mailbox reply, but as you said it is not being sent.

Some of the abbreviations in the code are difficult for me to understand. Do you know what the following fields in the sm_cfg struct do? I assume this represents a SyncManager configuration; knowing what the variables represent will make it easier for me to debug.

typedef struct sm_cfg
{
   uint16_t cfg_sma;
   uint16_t cfg_sml;
   uint16_t cfg_sme;
   uint8_t cfg_smc;
   uint8_t cfg_smact;
}sm_cfg_t;
CoolNamesAllTaken commented 5 years ago

I think I've figured out the meanings:

cfg_sma = SyncManager Address
cfg_sml = SyncManager Length
cfg_sme = SyncManager End Index
cfg_smc = SyncManager Control
cfg_smact = SyncManager Active
CoolNamesAllTaken commented 5 years ago

From what I can tell, the attempt to send the FoE ack manages to fill the outgoing mailbox with FOE_send_ack() and then ESC_writembx(), but the mailbox contents never gets sent. Do you know what part of the code is responsible for triggering a mailbox send?

One issue I've had in debugging this problem is that I have not found a way to write unit tests on the slave controller that will allow it to send mailbox messages on its own. Is it true that there is no way to configure the slave to send mailbox messages without remote configuration via an EtherCAT master application?

Thanks for your time!

nakarlsson commented 5 years ago

Yes, slaves never send frames only the master. When the mailbox is posted, a flag is set which tells SOEM to read the mailbox response. All this can be seen in the Wireshark, you need to figure out why the data read by SOEM is corrupt.

Does CoE work? Can you enter OP?

CoolNamesAllTaken commented 5 years ago

It looks like I can enter the SAFE_OP state but not OP. Output from simple_test

Simple test
Starting simple test
ec_init on \Device\NPF_{A9CC327B-B242-4767-ACCF-04415B4C465B} succeeded.
1 slaves found and configured.
Slaves mapped, state to SAFE_OP.
segments : 1 : 0 0 0 0
Request operational state for all slaves
Calculated workcounter 0
Not all slaves reached operational state.
Slave 1 State=0x12 StatusCode=0x001d : Invalid output configuration

Request init state for all slaves
End simple test, close socket
End program

It looks like this error (ALERR_INVALIDOUTPUTSM) is caused when SOES has trouble activating SyncManager 2 (output SyncManager). With some breakpoints I found that the error is being thrown in the function ESC_startinput() in esc.c, because one of the checks in ESC_checkSM23 was failing. It appears that in ESC_checkSM23, SM->length = 0, while ESCvar.SM2_sml = 2. Is a length of 2 a realistic length for SyncManager 2? Do you know why the LAN9252 might have a length of 0 configured for SyncManager 2? I've attached a wireshark capture of the exchange. simple_test.zip

I haven't paid much attention to CoE yet, as I was trying to use this program as a lightweight EtherCAT bootloader, so I focused primarily on FoE and bootstrap first. Is FoE functionality dependent on CoE?

Something is definitely fishy here. I've written unit tests for all of my low-level drivers, so I'm fairly confident that I'm writing / reading from the LAN9252 correctly over SPI. Are there any other places I should look? It sounds like the SOES code itself is pretty well tested.

CoolNamesAllTaken commented 5 years ago

Wireshark and terminal output of slaveinfo slaveinfo.zip

SOEM (Simple Open EtherCAT Master)
Slaveinfo
Starting slaveinfo
ec_init on \Device\NPF_{A9CC327B-B242-4767-ACCF-04415B4C465B} succeeded.
Time:1565219757.304 MBX slave:1 error:0013 Unknown
1 slaves found and configured.
Calculated workcounter 0
Not all slaves reached safe operational state.
Slave 1 State=12 StatusCode=  1d : Invalid output configuration

Slave:1
 Name:evb9252_dig
 Output size: 0bits
 Input size: 0bits
 State: 18
 Delay: 0[ns]
 Has DC: 1
 DCParentport:0
 Activeports:1.0.0.0
 Configured address: 1001
 Man: 00001337 ID: 000004d2 Rev: 00000000
 SM0 A:1000 L: 128 F:00010026 Type:1
 SM1 A:1080 L: 128 F:00010022 Type:2
 SM2 A:1100 L:   0 F:00000024 Type:3
 SM3 A:1180 L:   0 F:00000020 Type:4
 FMMUfunc 0:1 1:2 2:0 3:0
 MBX length wr: 128 rd: 128 MBX protocols : 0c
 CoE details: 13 FoE details: 01 EoE details: 00 SoE details: 00
 Ebus current: 0[mA]
 only LRD/LWR:0
End slaveinfo, close socket
End program

SOES log

Starting main_run
soes: Slave stack init started
soes: FOE_init
soes: APP_safeoutput
Slave Initialized
=====Begin Testing esc_hw.c                            =====
=====End Testing esc_hw.c                              =====
=====Begin testing LAN9252 Registers                   =====
=====End testing LAN9252 Registers                     =====
=====Begin testing LAN9252 Mailboxes                   =====
=====End testing LAN9252 Mailboxes                     =====
soes: state 1
soes: state 1
soes: CoE string set value not supported.
soes: 1008:00 = 00
soes: CoE string set value not supported.
soes: 1009:00 = 00
soes: CoE string set value not supported.
soes: 100a:00 = 00
soes: 6000:01 = 00
soes: 7000:01 = 00
soes: 7000:02 = 00
soes: 8000:01 = 00
soes: state 2
soes: 7000:01 @ 0
soes: 7000:02 @ 8
soes: 6000:01 @ 0
soes: ESC_ALerror 0x1d
soes: state 12
nakarlsson commented 5 years ago

Seems that CoE have the same problem, it seems you can read CoE,FoE frames. So, the trouble must be when writing data. Since, you seem to be able to write ESC registers, ALStatus did update to BOOT for example, look at the write PDRAM, starting at 0x1000.

Try to write a know value to an unused PDRAM address, outside SM0-3. Use the very same functions used by mailbox write. Use , SOEM to read that address using FPRD. This would bypass SOES and verify the SPI/LAN9252 in some way.

CoolNamesAllTaken commented 5 years ago

I haven't had a chance to work on this today, but will run this test and let you know the results tomorrow. Thanks for the suggestion!

CoolNamesAllTaken commented 5 years ago

Wahoo! I found a big bug with the way I implemented the low-level lan9252_read and lan9252_write drivers. I fixed that, and slaveinfo now returns the following:

Slaveinfo
Starting slaveinfo
ec_init on \Device\NPF_{A9CC327B-B242-4767-ACCF-04415B4C465B} succeeded.
1 slaves found and configured.
Calculated workcounter 3

Slave:1
 Name:evb9252_dig
 Output size: 16bits
 Input size: 8bits
 State: 4
 Delay: 0[ns]
 Has DC: 1
 DCParentport:0
 Activeports:1.0.0.0
 Configured address: 1001
 Man: 00001337 ID: 000004d2 Rev: 00000000
 SM0 A:1000 L: 128 F:00010026 Type:1
 SM1 A:1080 L: 128 F:00010022 Type:2
 SM2 A:1100 L:   2 F:00010024 Type:3
 SM3 A:1180 L:   1 F:00010020 Type:4
 FMMU0 Ls:00000000 Ll:   2 Lsb:0 Leb:7 Ps:1100 Psb:0 Ty:02 Act:01
 FMMU1 Ls:00000002 Ll:   1 Lsb:0 Leb:7 Ps:1180 Psb:0 Ty:01 Act:01
 FMMUfunc 0:1 1:2 2:0 3:0
 MBX length wr: 128 rd: 128 MBX protocols : 0c
 CoE details: 13 FoE details: 01 EoE details: 00 SoE details: 00
 Ebus current: 0[mA]
 only LRD/LWR:0
End slaveinfo, close socket
End program

Thank you so much for your help @nakarlsson !

The slave still has some trouble keeping up with the firm_update program (loses sync with some of the state change requests in the early stages of the program, and throws an Invalid State Change error). Could this be a function of my SPI connection between the STM32 and the LAN9252 being too slow? The upload seems to work when I step through the state change parts of the firm_update program manually.

nakarlsson commented 5 years ago

Might be, check the wireshark. 256kHz is very slow.

CoolNamesAllTaken commented 5 years ago

Ok, I will assume that that might be the case for now, and will debug further after I've created a physical board where I can run the SPI speeds faster without noise issues.

I was able to get a firmware file fully flashed to the STM32 yesterday over FoE, which was fantastic!

CoolNamesAllTaken commented 5 years ago

Currently, I am utilizing soes as a bootloader which can load other applications onto the STM32 host MCU. This soes bootloader is started when the full system is reset via the LAN9252, and should nominally jump into the main application. However, in the case of a firmware update, I would like the bootloader to not jump into the main application until it has updated the firmware. Do you know of a standard technique for setting a jump/don't jump flag on the LAN9252 that is persistent across a software reset? I have thought of a few techniques, but I am wondering if there is a standard way.

Some of my ideas:

  1. Find a NASR (Not Affected by Software Reset) register in the LAN9252, and write a flag to it using SOEM before resetting the MCU and LAN9252. The soes bootloader will read this value on startup to decide whether to jump or update. I have not yet found a register that suits this purpose.
  2. Have the soes bootloader wait for a fixed delay (e.g. 2 seconds) on startup. If a firmware update is not triggered during this time, it should jump to the main application. This technique would work, but seems sketchy.

I would love to hear your thoughts as an experienced developer--I'm sure that there's probably a standard method here that I'm missing!

nakarlsson commented 5 years ago

The EtherCAT Semiconductor Device Profile describe one way, ETG5003_2_V1i0i0_S_D_FwUpdate.pdf.

I'm not aware of any LAN9252 or ESC NASR.

nakarlsson commented 5 years ago

Post a wireshark of the transfer and I’ll try to check what is going on.

CoolNamesAllTaken commented 5 years ago

I've attached a wireshark of a failed firmware transfer that was caused by the state change issue that I mentioned.

Thanks for the link to the ETG spec! It appears that there isn't a standardized way for indicating a boot jump before reset. I think that I'll try implementing a time-delay jump method.

failed_firm_update.zip

I'm out of the office next week but will get back to ironing out the final bugs as soon as I'm back. Thanks again for your assistance!

nakarlsson commented 5 years ago

In the failed wireshark your slave never write any response to SOEM, does you console output indicate it get the request and write a response?

CoolNamesAllTaken commented 4 years ago

Hello! My sincerest apologies for my very late reply--I forgot to respond to your last comment when I got back from vacation, and finally got around to working on the software again after building the first hardware prototype. I was able to reproduce the invalid state change issue with a SPI speed of 15 MBPs. The SPI self-tests I programmed at 15MBPs have passed, so I don't think that data is being corrupted at this baud rate.

The slave does not respond in the failed wireshark due to some sort of delay issue; if I run the firmware update program slowly, hand-stepping between break points, the firmware update proceeds without an issue. Does the firmware update program have extremely short timeouts or something else that might be creating this timing issue?

nakarlsson commented 4 years ago

No, I use it for firmware update on a target that flash every incoming buffer.

nakarlsson commented 4 years ago

Seems it is completed? Re-open and we'll take a look again if needed