NETMF / netmf-interpreter

.NET Micro Framework Interpreter
http://netmf.github.io/netmf-interpreter/
Other
487 stars 224 forks source link

CMSIS RTOS RTX, is it required? #297

Closed techcap closed 8 years ago

techcap commented 9 years ago

I'm on porting ENCJ68 driver in MF4.4. In the part of lwip, there are so many changes. I don't know what CMSIS RTOS RTX is, but I'm curious if it is really required. lwip 1.4.1 has depedency with arch library which is in CMSIS_RTOS.

smaillet-ms commented 9 years ago

Yes, a CMCIS-RTOS API compliant runtime is required for the port of LWIP and NETMF. (At present we only have support for the CMSIS-RTX implementation of that API [some small code changes would be required to support another system due to the use of macros designed for C usage and we need to wrap things in C++]) The short story on the reasons for this are in the way that LWIP and NETMF were both designed to handle operating on top of an OS, as well as standalone without one. However, in standalone mode they both want to be in control of everything. To resolve this discrepancy the previous port of LWIP for NETMF modified the LWIP code base. Unfortunately that had side effects that resulted in issues, such as timers for DHCP renewals running down ~10x faster than they should, which resulted in frequent renewals and appearance of network down failure states to applications while it was renewing the devices address. For v4.4 we chose to leverage the freely available RTOS to let LWIP run in it's multithreaded mode, unmodified and the NETMF CLR runs on another thread. This resolved the core issues in networking.

techcap commented 9 years ago

Thank you for your detail response. I tried with CMSIS-RTOS. But it kept going to FAULT_SubHandler in __svcMutexWait(0, 0xFFFFFFFF), osMutexWait, ScopedLock scopedLock( SysArchLock ), sys_periodic_timeout, sys_timeouts_init, lwip_init, tcpip_init, LWIP_SOCKETS_Driver::Initialize(), HAL_SOCK_Initialize, Network_Initialize, CPU_InitializeCommunication, HAL_Initailize, BootEntry.

For building, I copied CMSIS_RTX_Config project from MCBSTM32F400 and used enc28j60 driver.

Without lwip, it worked very well. Could you give me any advice? Thanks.

smaillet-ms commented 9 years ago

HardFaults on svc calls is usually due to interrupts being disabled via the PRIMASK. To make svc calls the PRIMASK must at least allow the SVC handler to run. When using the OS it is important to include the OS Globallock library as that uses an actual OS lock instead of disabling interrupts with PRIMASK. To help illustrate the distinctions between using the RTOS and not using it the MCBSTM32F400 includes two projects, the TinyCLR and TinyCLR_NONET, you can do a diff between them to help see the differences.

techcap commented 9 years ago

As you mentioned, I checked all the item in TinyCLR and TinyCLR_NONET. And I found I missed GlobalLock project including. After including it, I could pass the code at __svcMutexWait. Thank you very much!

But still I have a similar problem. It keeps going to HardFault_Handler in __svcSemaphoreWait, osSemaphoreWait, sys_mutex_lock(&mem_mutex), mem_malloc, pbuf_alloc, enc28j60_lwip_xmit, etharp_send_ip, etharp_output, ip_output_if_opt, ip_output_if, udp_sendto_if, "dhcp_discover", dhcp_start, UpdateAdapterConfiguration, TcpipInitDone, tcpip_thread.

It looks like that the problem is occurred when mem_malloc is called. But interesting thing is this. Before called the dhcp_discover in dhcp_start(dhcp.c L691) which has HardFault problem, other mem_malloc in dhcp_start(dhcp.c L656) is called without problem.

So I think the problem seems to be related to the former part of enc28j60_lwip_xmit. In enc28j60_lwip_xmit, it has GLOBAL_LOCK(encIrq) before calling pbuf_alloc which has HardFault problem. Is it related to the problem? Could you give me more advice? Thank you very much.

techcap commented 9 years ago

pbuf_alloc, pbuf_free in LWIP is not thread safe. So before calling pbuf_alloc, I released irq lock throught irq.Release(). So now it worked ethernet in sample socket program. But sometimes socket exception occurred. I think there are some problem which is related with OS syncronization. I don't know well about CMSIS RTOS. In enc28j60 driver, I think there needs some disabling irq code when socket interrupt is occurred. Could anyone give some help?

smaillet-ms commented 9 years ago

Generally speaking you should be able to get along without needing to disable interrupts. At this point it is hard to say what could be going on without further details. If you have forked the GitHub repository can you push your changes to a branch on your private fork and then post a link to it here so others can have a look to review in more detail?

techcap commented 9 years ago

I uploaded my project all. I cannot sure if you can see my repository. I'm newbie on github.

On the original code, it used GLOBAL_LOCK. But after global locking, memory functions such as pbuf_alloc, pbuf_free doesn't work. Because They depend ISR.

Now, I'm testing the code using mutex, which I uploaded. I couldn't find how to disable interrupt without interference with memory functions. Through mutex, It doesn't disable interrupt but prevent overlapping. But I cannot sure if it is right implementation. I'm on testing this firmware with managed code. Thank you very very much for your help.

smaillet-ms commented 9 years ago

the LWIP buffer APIs already have a lock internally so you shouldn't need to use any locking on that.

techcap commented 9 years ago

Yes, for LWIP buffer. But the problem is that on enc28j60 driver, it disables interrupt on interrupt handler for preventing overlap, using GLOBAL_LOCK, which disable interrupt and it affect to the LWIP memory operation.

When I tested without GLOBAL_LOCK, it hangs in short time in the SPI operation. I think it is because of overlapped ethernet interrupt handler. After using mutex, there is no hang, not yet. Thanks^^

smaillet-ms commented 9 years ago

If you are using the OS GlobalLock library it DOES NOT disable interrupts. The general goal is to reduce or eliminate the need to ever disable interrupts. Use of nested and prioritized interrupts in the interrupt controller along with interlocked APIs should help in eliminating most cases. Why do you think you need to disable interrupts?

techcap commented 9 years ago

If I didn't misunderstand the code, calling GLOBAL_LOCK execute __disable_irq() in SmartPtr_IRQ. On enc28j60, after calling GLOBAL_LOCK, pbuf_alloc goes to fault. If I remove GLOBAL_LOCK, interrupt or xmit functions are executed overlapped, and interferes spi command order, and finally goes to hang on waiting spi ready bits. As you said, using prioritized interrupts will be best. But unfortunately I don't know much about CMSIS RTOS. For workaround, I used irq.Release and Acquire method for restoring irq temporarily. I sent pull request. Please check it. After modified, program runs well during two days. Thanks.

smaillet-ms commented 9 years ago

If you are using the proper OS global lock library GLOBAL_LOCK does NOT disable interrupts it uses an actual OS mutex. However, your description sounds like you are using the globl lock library that does disable interrupts as that would explain the symptoms you describe. If interrupts are off by setting PRIMASK when you call the allocation code you will get a fault. This is because the heap allocation code makes a call to the OS to lock access to the heap however it uses an SVC call instruction to do that, which will result in a hard fault if the PRIMASK prevents the SVC handler from running. If you must dsiable interrupts you should do so for the smallest possible region of code that you can.

techcap commented 9 years ago

I used GLOBAL_LOCK in MCBSTM32F400 project. Isn't it proper OS global lock?

josesimoes commented 9 years ago

@techcap: good job on porting this driver to 4.4! Did you build/test it with GCC compiler or MDK?

techcap commented 9 years ago

I used MDK. And now it has several bugs.

lt72 commented 8 years ago

Hi techcap, still experiencing issues on this front? or did you manage to resolve them?

techcap commented 8 years ago

Sometimes it stops. I don't know if the problem is related to this. I'll close this issue. And I'll repost if it is related to this. Thanks.