Proposal: PJON Highlevel implementation for Linux

gioblu / PJON

PJON (Padded Jittering Operative Network) is an experimental, arduino-compatible, multi-master, multi-media network protocol.

Other

2.73k stars 239 forks source link

Proposal: PJON Highlevel implementation for Linux #388

Closed rainerschoe closed 2 years ago

rainerschoe commented 3 years ago

Hi, I am using PJON for a home automation project. As a gateway and frontend I use a Raspi or Linux system translating PJON to gRPC. However the Linux interface provided in this repo is not usable in standard C++ linux programming without considerable frustration, as it uses event-loop, is not thread safe and has some other limitations.

I implemented a wrapper around PJON, called __PJON_Highlevel__. As Linux systems do not have the performance and size constraints as AVR controllers, I can provide a quite convienient API. The following key-features:

Internally uses PJON class for all bus operations, so no duplicate code
Handles Rx and Tx queues nicely
Has the notation of "Connections"
Provides a high-level way to represent a PJON Address with a parser to and from string
Works totally asynchronous (no eventloop visible to the user)
Does all PJON state handling internally in a thread
Supports getting feedback for each packet (TX ok or failed).
Uses modern C++ features like std::future to handle async packet sending

A simple example using this interface looks like this:

#define PJON_INCLUDE_TS true 
#include "pjon_highlevel.hpp"
#include "strategies/ThroughSerial/ThroughSerial.h"
int main()
{

  // configure PJON strategy:
  ThroughSerial serialStrategy;
  int fd = serialOpen("/dev/ttyUSB0", 250000);
  serialStrategy.set_serial(fd);
  serialStrategy.set_baud_rate(250000);

  // Initialize PJON_HL:
  PjonHL<ThroughSerial> hlBus(0x11, serialStrategy);

  // create a connection:
  // It is possible to use convenient String representation of a PJON addr:
  // DeviceId@BusId:Port  only deviceId is mandatory
  auto connection = hlBus.createConnection(Address("42@0.0.0.0"));
  // connection can now be used to send/transmit packets from and to the target
  // address.

  // Send a packet with a timeout of 1000ms (send will not block):
  std::vector<uint8_t> txPayload;
  txPayload.push_back(0x00, 0x45, 0x78);
  std::future<bool> success = connection->send(std::move(txPayload), 1000);

  // Check if send succeeded (this will now wait until packet is sent or failed):
  if(success.get())
  {
    std::cout << "success :)\n";
  }
  else
  {
    std::cout << "failure :(\n";
  }

  // now receive data on this connection (wait up to 6s for data to arrive):
  Expect< std::vector<uint8_t> > received = connection->receive(6000);
  if(received.isValid())
  {
      // packet received
      std::vector<uint8_t> & data = received.unwrap();
      for(uint8_t byte : data)
      {
        std::cout << (uint16_t)byte << std::endl;
      }
  }
  else
  {
     // no packet received within 6s
  }
  return 0;
}

I am posting this here, to ask if you are interested to include this as a part of the PJON repository.

If you think this makes sense I could try to create a PR and we could work together to make this compatible with your overall design philosophy of PJON.

Currently I only tested this with ThroughSerial. however I designed the wrapper as a template using the strategy as a template argument, so in theory it should work for any strategy.

My motivation of contributing this upstream is as follows:

I think PJON is an awesome project
As I use some internas of PJON (most notably access the Packet Queue, to retrieve TX state) I am afraid that in future PJON versions PJON_Highlevel might get incompatible if I develop it as a separate project.
Other people might benefit from it.
Using PJON in larger c++ projects currently has some limitations (header-only, multiple definition issues) maybe this is a first step to go into a direction which allows PJON usage in larger projects with build systems etc.

gioblu commented 3 years ago

Ciao @rainerschoe thank you very much for your compliments and for your feedback.

However the Linux interface provided in this repo is not usable in standard C++ linux programming without considerable frustration, as it uses event-loop, is not thread safe and has some other limitations.

I have been stuck on a lower level for more than a decade. Few years ago, I started to study the inner workings of linux and work on the LINUX interface with the help of the community, although that is still not mature nor optimal. I am open to accept contributions to make it better.

Other people might benefit from it.

I agree with you, what you propose could be useful for linux users. Would be a pleasure to work with you on this, although, I would try to add as few LOC as required to effectively obtain the result. Feel free to open a PR 👍

Seems the code uses all modern C++ features and adheres to recent best practices.

Connection looks nice too. I know this is a stupid detail, although, without being able to read the source this is the only feedback I can give for now. I do see 42@0.0.0.0, why not 0.0.0.0.42 or 0.0.0.0/42 ? Considering how the address space specified, the device id is bound to and logically "contained" in a bus with a certain bus id (or I am just left and as always see things the other way round). The device id in the packet format is the first field just to let devices efficiently and quickly abort reception if that does not match theirs.

Thank you very much for your support :)

rainerschoe commented 3 years ago

Thanks for the feedback.

I will try to create a PR as a "Work in Progress" for further technical discussion, need to do some cleanup first though. Most pressing point will be the issue of PJON being a header only library, as _PJONHighlevel currently is written using small parts in cpp files which need to be compiled and linked.

Regarding address string representation: I don't really care what format to use. My motivation for the DeviceId@Bus:Port was motivated by two things:

Similar to email or first name / last name concept. Most people will use single bus topologies, so most important is DeviceId.
Handles optional parts as part of order: DeviceId is mandatory, rest is optional
Easy to parse optional parts: if string ends, just stop parsing and set rest to default values. (this is not really an argument, with a simple regex I could also parse your proposals in all situations :P )

I can try to write a parser for your 0.0.0.0/42 format, also accepting reduced and extended forms:

42 device id only
42:7657 device id + port
0.0.0.0/42 bus + device id
0.0.0.0/42:7657 full form

Regarding LOC: The project currently has 739 lines of code. with ~20% being comments. However with object oritented design principles LOC is not really a good criteria for code quality and maintianability, as you tend to write many support classes to achieve good code separation.

gioblu commented 3 years ago

Ciao @rainerschoe in the mid 2010s I chose to have a header-only library not to worry about different platforms where the library might be used and furthermore avoid to worry about build systems. Although I agree the state of the project now may require a more advanced setup. I agree with you about the connection's reduced and extended forms, very nice. My only concern about LOC and implementation complexity is maintenance's sustainability, although 739 additional lines looks viable.

gioblu commented 3 years ago

Ciao @rainerschoe forgot to say that PJON supports also the use of the MAC address to identify devices, for example it is possible to create a mesh network on bus id 0.0.01 setting all devices to device id 255 or PJON_NOT_ASSIGNED and use unique MAC addresses, if set as routers devices can route packets between each other and use the hop count to avoid network loops, that would be 0.0.0.1/1.2.3.4.5.6:7657 (when device id is not configured PJON automatically sets it to PJON_NOT_ASSIGNED). It may have sense to support also the mac address in the string representation. Another way to use the address-space could be to set local mode and have many devices using device id PJON_NOT_ASSIGNED and unique mac addresses, that would be something like 1.2.3.4.5.6:7657, when the bus id is not configured PJON automatically sets it to localhost or 0.0.0.0.

rainerschoe commented 3 years ago

Hi, have been working this weekend to clean up the PjonHL implementation to make it ready for a PR.

I also implemented the new Address parser as discussed. However I did not manage to include MAC support for now. I think this will be added as a feature later on, as I first need to read into the specification for this to get the general idea.

I need to do some testing first to make sure refactoring did not break anything, if all works well I might be able to create initial PR this week.

gioblu commented 3 years ago

Ciao @rainerschoe thank you very much for your support to the project, I think you did right to wait extending the address parser. As soon as it is ready I will test it and report back to you results.

Thank you again.

rainerschoe commented 3 years ago

Hi @gioblu,

I spent the last week finalizing, cleaning up, documenting, unit-testing and system-testing the PjonHL implementation. I published the result here: https://github.com/rainerschoe/PjonHL

The code size is now at 773 lines of code excluding comments, unit-tests and example. However I did not create a PR into PJON for now, as I wanted to discuss some things first with you:

PjonHL follows a different C++ coding/build style than PJON code, it needs to be compiled and linked. To do this in a nice way, it uses CMake. Currently PJON is pulled in as a dependency by CMake.
How is maintenance going to work for PjonHL? If we define it as part of PJON, I have a hard time maintaining PjonHL, as I need to go through PRs all the time, or need to get access to PJON repository. Also do you plan to maintain PjonHL too? This would then require very deep understanding of PjonHL.
I can only test PjonHL with ThroughSerial, as I only have a ThroughSerial RS485 physical test setup here. Even though PjonHL should work independently of the Strategy (as it just wraps PJON), there might still be some bugs/problems in Strategy initialization / EventLoop call frequency.
My understanding of the PJON protocol itself is not 100% firm, so I wanted you to review the PjonHL design to see if it matches with the PJON protocol philosophy and to make sure that I did not overlook some Protocol feature which PjonHL now makes impossible to use. (You already gave valuable input regarding Addresses and MACs).
My initial goal when proposing a PR was to keep the Interface/API I am using to implement PjonHL (Access tp PJON packet buffer) as synchronized as possible, to guarantee that PJON will not change breaking the APIs I am using. If PjonHL is an external project, we need to discuss the API/interfaces in more detail and make sure those remain quite stable over time or introduce some more stable ones (Mostly: Finding out if packet transmit succeeded or failed as well as register member function pointer as receiver/error function)

Depending on your opinion it might be a better idea to treat PjonHL as a separate project and keep in sync regarding APIs.

It would also be possible to add PjonHL as a submodule into PJON, if you think this makes sense. This way PjonHL would be more available to users and still somehow related to PJON, while I can maintain the project separately.

If you think it makes sense, I can sill create a PR as originally planned.

I would be very happy about any kind of feedback / review / tests on this.

Maybe some technical questions at this point:

https://github.com/rainerschoe/PjonHL/blob/master/Bus.inl in line 292: I need to check if a packet was transmitted. I do this by accessing packet buffer directly: m_pjon.packets[m_txQueue.front().m_pjonPacketBufferIndex].state == 0 and then check if I received a call to error_function to determine if success/failure. I have the feeling I am picking around in the guts of PJON here. Is there a way to make this more stable?
As PjonHL is object oriented, in theory you could initialize multiple instances (multiple buses) e.g. for routing. However I am having a hard time registering a member function in PJON to call on received/error, as PJON only sopports raw function pointers for those callbacks. My current workaround is using a global function which I register with PJON and then forward into my memberfunction from there. However this now limits PjonHL to be instanciated only once. See https://github.com/rainerschoe/PjonHL/blob/master/Bus.inl line 70 downwards the currently messy setup of the callback functions.

gioblu commented 3 years ago

Ciao @rainerschoe thank you very much for your valuable contribution to the PJON project. I am sure many users will love this. Compliments for the work done, very slick.

I think one of the motivations why PJON is handy and easy to use for many, is its implementation design. It is monolithic, it requires 0 dependencies in most cases and the repository contains both examples, implementation, documentation, specification along with all supported data links in 1.7MB. It was for sure a big effort to verify each PR specially about contributed data links I was not familiar with (such esp ESPNOW, LoRa ecc.) although looking at the code, I would do it again, and I see I have learned a lot in the process.

PJON is still experimental so breaking changes to the API or its inner workings may come in the future.

Considering the above It may be better to include PjonHL in the PJON repository if you are ok with it.

About your questions:

At line 292 I agree it is not nice to check for a 0 in the state of the packet, considering that is the default state of empty packets, update deletes the packet immediately after transmission if that occurred successfully (and so in the state you should find 0). To be able to use directly the packet buffer of PJON you should avoid the packet auto deletion https://github.com/gioblu/PJON/blob/master/src/PJON.h#L778 and then delete manually within your wrapper packets that were successfully transmitted, or the ones with PJON_ACK in the state. This approach should give you complete control over packets in the buffer. I read in the comments you are concerned about the potential side-effects provoked by the async ack, although consider that the feature has been removed in version 13.0
To do so check out this example: https://github.com/gioblu/PJON/blob/master/examples/ARDUINO/Local/SoftwareBitBang/ClassMemberCallback/Receiver/Receiver.ino

Thank you for rising the warning issue, I will fix it as soon as possible.

gioblu commented 3 years ago

Ciao @rainerschoe I have looked into PjonHL and its new release, compliments for the work done. I will run tests this afternoon and let you know the results.

If you wish to keep it in a separate repo I can add it to the compliant tools list where other wrappers and compatible projects are linked. It would likely get more traffic and usage being part of the PJON's repo but I understand it may be more handy for you to develop and maintain it avoiding PRs, so, choose what better fits your needs. Thank you again for your support to the project.

rainerschoe commented 3 years ago

Sure, I spent some time this weekend doing some more tests (found out that default bus config did not correspond to native PJON bus config, which was causing some problems receiving packets). Now PjonHL supports explicit and compatible default configuration.

It seems to work now except some timing issues. I needed to increase TS timing for ACKs to be seen in time on AVR side.

This is most likely caused by my "sleep()" call in the PjonHL event-loop, which is necessary to avoid busy waiting (which would cause high CPU usage / power consumption on Linux systems). See Bus.inl line 314 downwards.

My current ideas to solve this are:

Modifying the linux strategies/adapters to provide "linux poll" style async semi-blocking methods.
Make the sleep time smaller and user-configurable so that users have a better awareness of this and can trade of CPU usage vs latency.

I think (2) is very easily done, whereas (1) would need deep understanding and re-design of existing PJON code.

I think I will first stabilize PjonHL for some more time and await your feedback/suggestions before we can merge it into PJON repo, then we can talk about if we completely place PjonHL inside PJON source code, or if git submodule is a better approach.

gioblu commented 3 years ago

Ciao @rainerschoe I see you sleep for 10 millis at line 331 of Bus.inl, that is added to the packet computation needed by the receiver and the latency, the sum overshoots the maximum acceptable timeout of 45 millis.

You can edit the maximum time out defining TS_RESPONSE_TIME_OUT before including PJON.h
You could also make the sleep time configurable, that for sure will help with fine-tuning

rainerschoe commented 3 years ago

Is there some generic (strategy agnostig) way to determine if calling receive() is needed to be called soon?

I.e. If the strategy received a single byte in this iteration we could assume it is very likely to receive another byte within the next milliseconds and then we could skip the sleep in this case.

For example in ThroughSerial I see the internal statemachine which holds this information. Unfortunately this information is not propagated to the receive() function or the PJON class.

This way we could reduce latency and improve bandwidth considerably, by only sleeping when we are not in the middle of receiving something.

one minimal way of implementing this would be:

Add a function bool isMoreWorkLikely() function to each strategy (which if possible returns true whenever it is likely that an immediate eventloop call would do significant work. Default implementation can be just return false if we do not want to solve the problem for all strategies for now)
expose this via PJON class
PjonHL then uses this after each call to receive() and only sleeps if function returns false

gioblu commented 3 years ago

Ciao @rainerschoe I think has sense to sleep in case receive returns PJON_BUSY or PJON_NAK and I agree it is better to avoid sleeping while something is being received. Maybe could be more simple to let receive return PJON_RECEIVING and let strategies which support non-blocking reception return it? In that case you could also avoid sleeping if receive returns PJON_RECEIVING. It may have sense also because benchmarks and tests based on the result of receive would be more accurately reporting what happens (for now PJON_FAIL is returned until the whole packet is received).

Whatever is the choice for its implementation, this would be a breaking change, I am nearly ready to release 13.1 before going for a new major version. I think we could agree upon which is the best way to implement it, so you can go further, and I can push this in master as soon as 13.1 is released.

rainerschoe commented 3 years ago

Sounds good to me. I am in no rush for this.

Although experience has taught me, that mixing too many things into one integer return value (good / bad + some semantics) might bite back if you over-do it, as you lose overview over which function might return back which kind of values.

In essence this becomes a type-safety issue as it is not strictly defined by the C++ type system what the semantic return values of any given function might be ( int has many possible values :stuck_out_tongue: ). So you have to provide and maintain very good documentation on return values to keep this stable.

gioblu commented 3 years ago

@rainerschoe the return value of receive just contains the result of the attempted reception, (channel busy, fail, crc error, success) adding the result "reception in process but not completed" does not look like mixing things. Although, I do see as a con that existing sources relying on the receive return value may need to be updated, requiring effort to users.

@fredilarsen @jdaandersj what do you think?

jdaandersj commented 3 years ago

Perhaps I am oversimplifying or do not understand some part of the new bits:

Surely lack of return indicates not yet completed? What is the purpose to include it in the return?

gioblu commented 3 years ago

@jdaandersj the point is, @rainerschoe uses ThroughSerial, which is the only strategy able to make non-blocking reception, on linux. ThroughSerial has an internal buffer in which composes the frame piece by piece as it comes in and ultimately returns it to PJON when it is complete. @rainerschoe needs to reduce cpu usage giving some time to the rest of the system in between each receive call. Although doing so he overshoots the acknowledgement response timeout. To mitigate this he proposed to edit PJON to be able to know if ThroughSerial is receiving a frame, and if so fully use the cpu to get the frame and respond as quickly as possible.

We were discussing if to:

Add a bool receiving to the strategy prototype which says if the strategy is receiving but the frame is still not complete
Add to receive return value which indicates that a frame is being received but not completed

rainerschoe commented 3 years ago

Hi @jdaandersj I think I need to give some background in Linux application programming vs. Embedded low-level programming (sorry in advance for being a little bit verbose here):

For AVR plattform or embedded low-cost controllers without threading support in general, the Event-Loop is a very good idea and well implemented in PJON: Call receive() + update() often and PJON will try to do as less work as possible in there to give user more CPU time for other tasks (e.g. blinking LEDs :stuck_out_tongue: ). For ThroughSerial strategy for example: each call to receive() will handle the reception of 1 or 0 bytes from UART. Only once a whole packet is received - after many calls to receive() - this is communicted via a non zero return. Like I said this works really well and is a very good design for embedded low-cost systems.

However for Linux or systems with threading support you generally do not want those event-loops, as they are "busy-waiting" loops. I.e. one thread looping and doing the same thing over and over again (calling receive()) while most of those calls have no effect. Threading works by telling the operating system: "now I do not need to do any processing for a while" (also called yield), then some other thread can take over or the CPU can reduce its frequency, conserving power. This way you can have very powerful CPUs but still not constantly over-heat / consume horrible amount of power.

Telling the operating system that you are willing to yield is done via different highlevel mechanisms in C/C++, to name a few:

Calling sleep()
Calling poll() on a file
Waiting on a condition-variable or mutex

Now in the case of PJON what we actually would want to do in this scenario is using poll() or some other kind of condition-variable allowing us to yield until exactly the linux device driver tells us that data was received over the strategy (e.g. UART or GPIO or Ethernet ...)

However this would require quite a lot of work and most probably be out of scope of PJON which is mostly intended for low cost IoT plattform.

So The idea was to only do minimal changes to PJON while still allowing an acceptable performance / powersaving / scheduling fairness tradeoff:

Call sleep() in PjonHL eventLoop, but try to find out when it makes sense to not call sleep() in order to not sleep when traffic is running. Currently this information is not exposed by PJON (e.g. ThroughSerial receives one byte at a time for each receive() call and receive still returns 0). The change would mean to define some way of communicating to the event-loop, that currently some receiving is done. Either via Return-Code or via separate function.

To clarify what I mean with danger of mixing things in return code: Currently the return value of receive_frame() or receive() is used for multiple semantics already:

communicate result (good or bad)
communicate length of data
communicate reason of error

Now @gioblu proposed to add some semantics to this:

communicate state (currently receiving something or not)

This will work fine, if we all remember it and document is really well :smile:

However I have seen similar designs blow up if projects get bigger, as with time more and more semantics/menaings get mapped to those return values. At some point nobody knows anymore which function can return which values.

Typically in modern programming to solve those problems you do not map semantics to integers, but use well defined data types for each function like enum class or class and try to separate concerns as much as possible. I.e. use struct { bool success; ErrorReason reason; } with error reason beeing another type. (have a look at Expect class in PjonHL for example.) Alternatively if it is not 100% necessary, do not put thing in the function return value, but add separate memberfunction to query the information.

That being said I think adding this to the return value is still OK in PJON, as PJON is still a quite small codebase and adding more datatypes and memberfunctions might already be over-engineering.

jdaandersj commented 3 years ago

Ok thanks for the thorough overview.

I think my conclusion would be that @gioblu should decide which path he prefers.

Add it to the return value to stick with the common approach of PJON currently. Or, decide that the intention is to expand PJON in long term, in which case probably makes sense to start to refactor all code in direction described by @rainerschoe. Could start just with high level but in my view there is no point doing that unless the long term goal is to transition all PJON to that approach.

My gut instinct was to have this separate from the return value but I realise now that the return values in PJON already contain a mixture of information so no point deviating just for one thing unless it is the new preferred method.

I hope what I have said makes sense and isn't complete nonsense :D

gioblu commented 3 years ago

Ciao @jdaandersj @rainerschoe thank you very much for your feedback. I took some time to evaluate your suggestions, I think @rainerschoe is right and a bool in the strategy is cleaner and will likely spare time to distracted users.

Now in the case of PJON what we actually would want to do in this scenario is using poll() or some other kind of condition-variable allowing us to yield until exactly the linux device driver tells us that data was received over the strategy However this would require quite a lot of work and most probably be out of scope of PJON which is mostly intended for low cost IoT plattform.

I agree, and also PJON in its more constrained use cases runs with 50 bytes of RAM, I was forced to keep the implementation simple to fit even in ATtiny45.

ThroughSerial linux support is still early and suboptimal, I am for sure open to work on it to make it more mature if that can be done keeping a single implementation covering all targets.