TheThingsIndustries / generic-node-se

Generic Node Sensor Edition
https://www.genericnode.com
Other
108 stars 31 forks source link

external flash operations fail after multiple initialization #173

Closed elsalahy closed 3 years ago

elsalahy commented 3 years ago

Summary:

The external flash operations fail when MX25R16_Init is invoked more than one time.

Steps to Reproduce:

  1. Call MX25R16_Init two times or more
  2. Attempt any flash operation just as read/write/erase

What do you see now?

The flash is functional when MX25R16_Init is invoked once. The flash is not functional when MX25R16_Init is invoked more than once.

What do you want to see instead?

Able to initialize the external flash more than once and have correct operation/functionality.

How do you propose to implement this?

By debugging what happens when the external flash is initialized twice and preventing the issue from occurring.

Environment:

Baremetal, OS

What can you do yourself and what do you need help with?

All

elsalahy commented 3 years ago

@marnixcro can you undertake this issue, this is a low priority issues, but a good one

mcserved commented 3 years ago

I found the cause of the issue, see these gpg commands:

After first init

(gdb) p mxic
$1 = {Priv = 0x20001088 <Spi.1>, AppGrp = {_HardwareInit = 0x0, _Write = 0x8004dbf <MxPP>, _Read = 0x8004455 <MxREAD>, _Erase = 0x8004ae9 <MxBE>, _Lock = 0x0,
    _Unlock = 0x0, _IsLocked = 0x0}, Id = "\302(\025\000\025", ChipSz = 2097152, PageSz = 256, BlockSz = 65536, N_Blocks = 32, ChipSupMode = 497, ChipSpclFlag = 4,
  RdDummy = 0x20000020 <SpiFlashParamsTable+32>, SPICmdList = 0x20000030 <SpiFlashParamsTable+48>, QPICmdList = 0x20000040 <SpiFlashParamsTable+64>,
  OPICmdList = 0x20000050 <SpiFlashParamsTable+80>, Pwd = "\377\377\377\377\377\377\377\377", tW = 30000, tDP = 10, tBP = 100, tPP = 10000, tSE = 240000, tBE32 = 3000000,
  tBE = 3500000, tCE = 240000000, tWREAW = 0, CurFreq = 0, WriteBuffStart = 0 '\000'}

After second

(gdb) p mxic
$2 = {Priv = 0x20001088 <Spi.1>, AppGrp = {_HardwareInit = 0x0, _Write = 0x0, _Read = 0x0, _Erase = 0x0, _Lock = 0x0, _Unlock = 0x0, _IsLocked = 0x0},
  Id = "\302(\025\000\025", ChipSz = 2097152, PageSz = 256, BlockSz = 65536, N_Blocks = 32, ChipSupMode = 497, ChipSpclFlag = 4,
  RdDummy = 0x20000020 <SpiFlashParamsTable+32>, SPICmdList = 0x20000030 <SpiFlashParamsTable+48>, QPICmdList = 0x20000040 <SpiFlashParamsTable+64>,
  OPICmdList = 0x20000050 <SpiFlashParamsTable+80>, Pwd = "\377\377\377\377\377\377\377\377", tW = 30000, tDP = 10, tBP = 100, tPP = 10000, tSE = 240000, tBE32 = 3000000,
  tBE = 3500000, tCE = 240000000, tWREAW = 0, CurFreq = 0, WriteBuffStart = 0 '\000'}

All things in the AppGrp are set to zero in the second one. These are function pointers, and will cause an error when executed as they will execute code at function 0.

This line from MX25R16_Init causes them to be set to 0: https://github.com/TheThingsIndustries/generic-node-se/blob/97442fdf3c0a3506eab73380def829bf5e937c7e/Software/lib/MX25R1635/MX25R16.c#L38

The reason they are not changed is due to a static flag in MxChangeMode (which is set to 1 at the end). As you might have guessed, this flag causes the function to return before the function pointers are set (which set memset to zero): https://github.com/TheThingsIndustries/generic-node-se/blob/97442fdf3c0a3506eab73380def829bf5e937c7e/Software/lib/MX25R1635/nor_ops.c#L450-L454 We could either remove this flag or we could remove the memset. @elsalahy what would you prefer the solution to be? Personally I feel that both the flag and the memset do not add anything of value, but the memset would be the easiest to remove without significant changes to the firmware. We could also just have a static flag in the init to prevent reinitialisation.

elsalahy commented 3 years ago

@marnixcro great finding. I'm leaning towards removing the FirstFlag Can you provide me with the value of Spi->CurMode in the first init and the second init to confirm the root cause?

elsalahy commented 3 years ago

This If condition should fail in the second try even if FirstFlag is set to 1. But it seems (Spi->CurMode == SetMode) is always 1 which is a bigger problem.

mcserved commented 3 years ago

@marnixcro great finding. I'm leaning towards removing the FirstFlag Can you provide me with the value of Spi->CurMode in the first init and the second init to confirm the root cause?

(gdb) p Spi->CurMode
$8 = 1
(gdb) p SetMode
$9 = 1

They stay the same during the second init.

mcserved commented 3 years ago

This If condition should fail in the second try even if FirstFlag is set to 1. But it seems (Spi->CurMode == SetMode) is always 1 which is a bigger problem.

It makes sense as we set as follows (one is set as function param) https://github.com/TheThingsIndustries/generic-node-se/blob/97442fdf3c0a3506eab73380def829bf5e937c7e/Software/lib/MX25R1635/nor_ops.c#L155 https://github.com/TheThingsIndustries/generic-node-se/blob/97442fdf3c0a3506eab73380def829bf5e937c7e/Software/lib/MX25R1635/MX25R16.c#L60

elsalahy commented 3 years ago

As we discussed, the best way is to edit MX25R16_Init to make it not use MxSoftwareInit or MxChangeMode