[Q] Random pauses or hesitation when printing.

V1EngineeringInc commented 1 year ago

While printing once or twice a print, sometimes more, the machine will stop right in the middle of printing for a second, maybe even two. No errors or anything, it leaves an ugly blob, and it just continues on. It does not happen the same place on the file or on the machine. If I had to say anything, it might happen more often during circle moves, or curves. This does not happen while using the same screens for my CNC machines.

I have 6 printers and 2 CNC's running TFT35 V3 E3 screens. I have noticed this issue for probably as long as I have used the screens, more than two years I would guess. Some screens are brand new and some are very old. I update marlin and the TFT several times a year. Some of my gcode is old and ran perfectly fine on older 1286 screens, and some of my files are new.

So I think this is either some sort of configuration issue, slicer setting, or a screen firmware bug. Any ideas?

Config, config.zip

V1EngineeringInc commented 1 year ago

Oh and I almost forgot, your code messes up homing and leveling. The printers will start before temp is reached somehow. normally you start a print and when the bed and extruder are at temp homing and leveling start, yours starts as soon as I hit go. That is not good.

rondlh commented 1 year ago

Oh and I almost forgot, your code messes up homing and leveling. The printers will start before temp is reached somehow. normally you start a print and when the bed and extruder are at temp homing and leveling start, yours starts as soon as I hit go. That is not good.

Right, I also noticed that printing started before the bed was at the right temperature. Luckily my bed warms up fast so it was not a big issue.

kisslorand commented 1 year ago

The printers will start before temp is reached

Right, I also noticed that printing started before the bed was at the right temperature

Working on it...

kisslorand commented 1 year ago

v1 was "proof of concept". In the meantime there was a v2 and now there's v3 version. v3 has the fully implemented "hiccup avoidance" and it comes with a hefty RAM reduction (+1600 bytes) and without FW size change (actually there's a 6 bytes increase :) ). It has many changes under the hood, if you want to take v3 for a ride it is available on Mega.

Please let me know how it goes for you.

Note: There's a bug at the end of a print, but print should be fine. I am working on fixing it.

rondlh commented 1 year ago

I tested v3.01 (R=100mm, cylinder test at 145mm/s without using arcs). All worked as expected, no hiccups, no issues at the beginning or end. Great job!

kisslorand commented 1 year ago

There's a nasty bug at the end of a print and some other during print. I am working on it.

The fix is ready, v3.01 is up on Mega.

kisslorand commented 1 year ago

Made some further work, synced to master, did some bugfixes and even more stability/speed enhancements. As a result v3.1 is up on Mega.

Note: The builds also include PR #2701, PR #2793, PR #2798, PR #2799, PR #2805 and a rework of #2788 (which is a partial fix only and introduces a new bug).

rondlh commented 1 year ago

I've been testing V3.1 (BTT TFT35 V3.0) for 3 weeks now, with various parts, and did not encounter any issues, and even better, no hesitations or pauses. I think this is a great improvement. Well done!

V1EngineeringInc commented 1 year ago

Sorry for the delay, I am back from MRRF and flashing all the printers now!

V1EngineeringInc commented 1 year ago

While I was there, I got to talk with the CEO and head of the partner company. Some notes were taken we will see if they look into some of the hardware and firmware issues.

V1EngineeringInc commented 1 year ago

I have this in three printers and have not seen any pauses yet.

rondlh commented 1 year ago

So what's next? Update: I made a base implementation of support for Marlin's ADVANCED_OK, which keeps Marlin's command buffers full and prevents buffer underruns, and thus should prevent pauses and hesitations. https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/issues/2819

carlosjln commented 1 year ago

@slwise I tried with BAUDRATE 57600 and it seems to have worked 🤔 I noticed less zits when I dropped from 250000 to 115200, so seemingly 57600 is working. I'm still running my delta at 75% speed controlled from the TFT screen as the extruder can't keep up. So I'll adjust my speeds on the slicer as well.

kisslorand commented 1 year ago

@carlosjln Did you test the builds mentioned here?

carlosjln commented 1 year ago

@kisslorand no, I tried it with my current build which the latest on the build directory. Tomorrow I'll try this test build, I've downloaded the files already.

kisslorand commented 1 year ago

So what's next?

https://github.com/kisslorand/BTT-TFT-FW

V1EngineeringInc commented 1 year ago

Did you make a pull request here for the fixes that are working for us?

kisslorand commented 1 year ago

No, there's no PR for the fix/enhancement.

rondlh commented 1 year ago

So what's next?

https://github.com/kisslorand/BTT-TFT-FW

OK, how about the source code? Either way, I believe using ADVANCED_OK, or at least using the 4 available Marlin command buffers is the way to go. My implementation is quite simple and basic (https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/issues/2819), but it's working very well. No more pauses or hesitation. I hope someone with more insight into the Touchscreen FW can work it out more/better.

rondlh commented 1 year ago

@kisslorand Your input on https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/issues/2819 would be very appreciated.

kisslorand commented 1 year ago

OK, how about the source code?

What about it?

@kisslorand Your input on #2819 would be very appreciated.

I am not sure on what aspect you'd want me to give my input. About ADVANCED_OK in general, as already discussed in this very topic, I already implemented ADVANCED_OK but for reasons I already exposed I didn't go that route but instead found a better, more general fix for the stuttering. ADVANCED_OK mitigates the outcome (at a certain degree) whilst my approach fixes the core of the issue.

V1EngineeringInc commented 1 year ago

whilst my approach fixes the core of the issue.

Why not do a PR request here, so it stays mainline? As much as I would like to use your branch, while supporting thousands of users, it is not a good idea to stray from the manufacturer's branch until they stop supporting it.

kisslorand commented 1 year ago

Actually I am ahead of the master branch (40+ commits), not a single commit behind. There are PRs dating back from 2022 August, many PRs (even those with bugfixes) got the "Abandoned" label. As of it's current state of this repository it pretty much looks like abandoned by its maintainers.

@V1EngineeringInc You know where to reach me if you want to have a nice chat about it.

rondlh commented 1 year ago

OK, how about the source code? What about it?

Could you make the source code available? We all love open source here.

@kisslorand Your input on #2819 would be very appreciated.

I am not sure on what aspect you'd want me to give my input. About ADVANCED_OK in general, as already discussed in this very topic, I already implemented ADVANCED_OK but for reasons I already exposed I didn't go that route but instead found a better, more general fix for the stuttering. ADVANCED_OK mitigates the outcome (at a certain degree) whilst my approach fixes the core of the issue.

I see you mention latency for ADVANCED_OK, but the added latency is virtually nothing, and is not an issue if the motherboard has some commands buffered already. Latency becomes a problem in the current approach where the TFT waits for the response of the previous command, before sending the next command. That is of course a very time critical process, especially if you are managing a lot of other things in the background. In my simple implementation I don't need to make the use of ADVANCED_OK configurable. If ADVANCED_OK is enabled, then the reported info is used. If it's not enabled then the TFT keeps track of the buffers (count commands sends vs. replies received). Currently the TFT only counts to 1 command, I allow it to count to 4 (Marlin's default command buffer size, also works for Reprap). I'm not familiar with the issues of some problematic motherboards, what's the issue there?

rondlh commented 1 year ago

I had a look at the source code and noticed the following issue which might explain the pauses and hesitations. The serial interrupt routine that is called when the TFT receives serial data only checks if the last received character is a line feed (\n). If it is a line feed then a signal is given that a complete message is available and can be processed. If it is not, then nothing happens, until the interrupt routine is called again. With some bad luck it could be triggered again in the middle of a message where there is no line feed character, but actually there could already be several complete messages in the receive buffers, in this way receiving a temperature report can cause a delay in the processing of an "ok" message resulting in a printer hesitation. The solution would be to not only check the last character, but the whole received serial data for a line feed character.

digant73 commented 1 year ago

I had a look at the source code and noticed the following issue which might explain the pauses and hesitations. The serial interrupt routine that is called when the TFT receives serial data only checks if the last received character is a line feed (\n). If it is a line feed then a signal is given that a complete message is available and can be processed. If it is not, then nothing happens, until the interrupt routine is called again. With some bad luck it could be triggered again in the middle of a message where there is no line feed character, but actually there could already be several complete messages in the receive buffers, in this way receiving a temperature report can cause a delay in the processing of an "ok" message resulting in a printer hesitation. The solution would be to not only check the last character, but the whole received serial data for a line feed character.

In theory, the serial interrupt handler (in Serial.c) should receive the remaining part of the message (ending with \n) in a very short time (unless some mainboards delay the transmission for unclear reasons). For that reason, I didn't apply any change on that logic in my PR #2788. I also don't have any kind of hesitation so I cannot test any possible fix. However, as an hot&dirty test, you could simply change this line in SerialConnection.c

if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1) || !infoHost.rx_ok[portIndex])

with this

if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1))

and verify if it solves your issue. If so, that change could be the PR

rondlh commented 1 year ago

In theory, the serial interrupt handler (in Serial.c) should receive the remaining part of the message (ending with \n) in a very short time (unless some mainboards delay the transmission for unclear reasons). For that reason

Yes, I agree, probably within a few milliseconds. It's probably not the cause of the hesitations I see, which are more like 1 second long.

I didn't apply any change on that logic in my PR #2788. I also don't have any kind of hesitation so I cannot test any possible fix. However, as an hot&dirty test, you could simply change this line in SerialConnection.c

if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1) || !infoHost.rx_ok[portIndex]) with this if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1))

Right, that's a possible improvement, or even 2 improvements in one. Just check the incoming data when data is available. A possible reason for the hesitations could be that the serial interrupt routine sets infoHost.rx_ok[portIndex] to true, just before Serial_Get sets it to false. The result is that the newly received data would not be processed until more serial data is received (temp report, or response to M220/M221). This situation certainly could cause a delay of a second or so, and is quite likely the cause of the issues we see. A semaphore would be needed to change infoHost.rx_ok[portIndex] in different locations without potentially causing conflicts, but like you show, the variable use can be completely avoided. Just check if serial data is available and check if the message is complete. I think this is an excellent improvement which I would recommend to implement asap. It's probably best to get rid of infoHost.rx_ok completely.

My hesitation issues were already solved by implementing ADVANCED_OK (https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/issues/2819), which keeps the motherboard command buffer full. Even if ADVANCED_OK is not enabled, this change uses 4 command buffers (Marlin's default) instead of the TFT default of 1 command buffer. With ADVANCED_OK disabled the TFT keeps track of the command and response counts. It's only a few lines of code and does wonders for me.

digant73 commented 1 year ago

A possible reason for the hesitations could be that the serial interrupt routine sets infoHost.rx_ok[portIndex] to true, just before Serial_Get sets it to false. The result is that the newly received data would not be processed until more serial data is received (temp report, or response to M220/M221). This situation certainly could cause a delay of a second or so, and is quite likely the cause of the issues we see.

It is also possible

My hesitation issues were already solved by implementing ADVANCED_OK (#2819), which keeps the motherboard command buffer full. Even if ADVANCED_OK is not enabled, this change uses 4 command buffers (Marlin's default) instead of the TFT default of 1 command buffer. With ADVANCED_OK disabled the TFT keeps track of the command and response counts. It's only a few lines of code and does wonders for me.

As far as I see in your implementation of ADVANCED_OK you consider as buffer size the number of gcode commands while it seems to me Marlin provides the number of available bytes in the input buffer. E.g. 4 returned by Marlin is considered in your implementation as 4 gcodes to be sent.

In any case, if you (or anyone having that hesitation issue) could test the validity of my proposed test it could help other users to fix their issue until also ADVANCED_OK feature is improved/refined

rondlh commented 1 year ago

As far as I see in your implementation of ADVANCED_OK you consider as buffer size the number of gcode commands while it seems to me Marlin provides the number of available bytes in the input queue. E.g. 4 returned by Marlin is considered in your implementation as 4 gcodes to be sent.

BUFSIZE (in Marlin's Configuration_adv.h) represents the number of command buffers, which are strings of size MAX_CMD_SIZE, read the comments above the section in Configuration_adv.h), not below it, which is about TX_BUFFER_SIZE, or even better check, Marlin's queue.h

   * GCode Command Queue
   * A simple (circular) ring buffer of BUFSIZE command strings.

  struct CommandLine {
    char buffer[MAX_CMD_SIZE];      //!< The command buffer
    bool skip_ok;                   //!< Skip sending ok when command is processed?
    #if HAS_MULTI_SERIAL
      serial_index_t port;          //!< Serial port the command was received on
    #endif
  };

  /**
   * A handy ring buffer type
   */
  struct RingBuffer {
    uint8_t length,                 //!< Number of commands in the queue
            index_r,                //!< Ring buffer's read position
            index_w;                //!< Ring buffer's write position
    CommandLine commands[BUFSIZE];  //!< The ring buffer of commands

In any case, if you (or anyone having that hesitation issue) could test the validity of my proposed test it could help other users to fix their issue until also ADVANCED_OK feature is improved/refined

Sure, previously I saw a hesitation only once or twice an hour (when I pay attention to it), so it will take some time to confirm. I just did 2 x 1 hours print with "|| !infoHost.rx_ok[portIndex]", removed (ADVANCED_OK code still enabled), and I can confirm, as expected, that is doesn't break things. I'm 90% sure that this is the real problem, it perfectly explains the hesitations and why it resumes after about 1 second. I will go back to the original code base with only "|| !infoHost.rx_ok[portIndex]" removed and report back to you in a few days.

I checked the code base to see if "infoHost.rx_ok" can be removed completely. I found only 1 problem case in printing.c TASK_LOOP_WHILE(condCallback(), if (infoHost.rx_ok[SERIAL_PORT] == true) sendEmergencyCmd("M108\n"))

I believe if you replace that line with the line below, then you don't need infoHost.rx_ok anymore. TASK_LOOP_WHILE(condCallback(), if (dmaL1Data[SERIAL_PORT].wIndex != dmaL1Data[SERIAL_PORT].rIndex) sendEmergencyCmd("M108\n"))

slwise commented 1 year ago

I don't know if this will help but just thought I would throw this out there. The hesitation seems to happen more and every time when printing in vase mode. I was using a 75mm tall Coca-Cola bottle in vase mode for testing.

digant73 commented 1 year ago

As far as I see in your implementation of ADVANCED_OK you consider as buffer size the number of gcode commands while it seems to me Marlin provides the number of available bytes in the input queue. E.g. 4 returned by Marlin is considered in your implementation as 4 gcodes to be sent.

BUFSIZE (in Marlin's Configuration_adv.h) represents the number of command buffers, which are strings of size MAX_CMD_SIZE, read the comments above the section in Configuration_adv.h), not below it, which is about TX_BUFFER_SIZE, or even better check, Marlin's queue.h
   * GCode Command Queue
   * A simple (circular) ring buffer of BUFSIZE command strings.
  struct CommandLine {
    char buffer[MAX_CMD_SIZE];      //!< The command buffer
    bool skip_ok;                   //!< Skip sending ok when command is processed?
    #if HAS_MULTI_SERIAL
      serial_index_t port;          //!< Serial port the command was received on
    #endif
  };

  /**
   * A handy ring buffer type
   */
  struct RingBuffer {
    uint8_t length,                 //!< Number of commands in the queue
            index_r,                //!< Ring buffer's read position
            index_w;                //!< Ring buffer's write position
    CommandLine commands[BUFSIZE];  //!< The ring buffer of commands

Ok. I reported what I was remembering (not checking on Marlin).

In any case, if you (or anyone having that hesitation issue) could test the validity of my proposed test it could help other users to fix their issue until also ADVANCED_OK feature is improved/refined

Sure, previously I saw a hesitation only once or twice an hour (when I pay attention to it), so it will take some time to confirm. I just did 2 x 1 hours print with "|| !infoHost.rx_ok[portIndex]", removed (ADVANCED_OK code still enabled), and I can confirm, as expected, that is doesn't break things. I'm 90% sure that this is the real problem, it perfectly explains the hesitations and why it resumes after about 1 second. I will go back to the original code base with only "|| !infoHost.rx_ok[portIndex]" removed and report back to you in a few days.

it is probably 100% now (merged PR #2788 was supposed to fix also that possibility). Simply check with only the proposed change and hopefully it solves the issue.

I checked the code base to see if "infoHost.rx_ok" can be removed completely. I found only 1 problem case in printing.c TASK_LOOP_WHILE(condCallback(), if (infoHost.rx_ok[SERIAL_PORT] == true) sendEmergencyCmd("M108\n"))

I believe if you replace that line with the line below, then you don't need infoHost.rx_ok anymore. TASK_LOOP_WHILE(condCallback(), if (dmaL1Data[SERIAL_PORT].wIndex != dmaL1Data[SERIAL_PORT].rIndex) sendEmergencyCmd("M108\n"))

This is not a problem (checked before my change proposal). That code could be maintained as it is

rondlh commented 1 year ago

@digant73 If you do a PR touching SerialConnection.c, please fix the 2 typos: dinamically --> dynamically

rondlh commented 1 year ago

I don't know if this will help but just thought I would throw this out there. The hesitation seems to happen more and every time when printing in vase mode. I was using a 75mm tall Coca-Cola bottle in vase mode for testing.

@slwise All input is welcome, don't hesitate, leave that to the printer. I think vase/spiral mode more clearly shows the hesitation compared to normal printing because the movements are continues. That's why I use a large cylinder for testing, with Arc Welder (Cura) disabled (I want to generate a lot of communication and commands). It's a good suggestion for everyone wanting to support the testing, thanks!

digant73 commented 1 year ago

dinamically

no problem. Just waiting from you having the issue if the change will fix it or not

rondlh commented 1 year ago

I've been struggling to get consistent data out of my tests, but finally have come to the following conclusions.

Writing rx_ok in both the main program and interrupt routine can cause the reported hesitations. Adding a 1ms delay in the critical section ( if (rIndex == dmaL1Data_ptr->wIndex) infoHost.rx_ok[portIndex] = false;) causes a hesitation every 2-3 minutes. This should definitely be changed (there are various options, rx_ok, could be removed completely, or be replaced by a message counter that is increased in the interrupt and examined in the main program).
For now I recommend to use low baud rates, like 57600 baud.

digant73 commented 1 year ago

I've been struggling to get consistent data out of my tests, but finally have come to the following conclusions.

Writing rx_ok in both the main program and interrupt routine can cause the reported hesitations. Adding a 1ms delay in the critical section ( if (rIndex == dmaL1Data_ptr->wIndex) infoHost.rx_ok[portIndex] = false;) causes a hesitation every 2-3 minutes. This should definitely be changed (there are various options, rx_ok, could be removed completely, or be replaced by a message counter that is increased in the interrupt and examined in the main program).

There is a fundamental problem in the serial communication of the TFT that also plays a role here. The serial data is transmitted to memory via DMA. The incoming data (like "ok" and temp reports) are quite fragmented (incoming byte... pause... another byte...etc) which causes a lot of (small) DMA transfers, this can overload the system and causing data corruption/duplication. This is especially bad at speeds faster than 250K baud and strongly depends on the host system (clock speed, buffers, configuration etc.). Faster baud rates will have more small DMA transfers than lower speeds. In my view, the incoming serial data handling should be completely replaced, probably with a system generating an interrupt per received byte. The main program can stay about the same. I believe @kisslorand has already implemented this, and from my testing with his implementation I found it works very well.

For now I recommend to use low baud rates, like 57600 baud.

but did you apply the change I suggested (at the beginning of function Serial_Get in SerialConnection.c)?

rondlh commented 1 year ago

but did you apply the change I suggested (at the beginning of function Serial_Get in SerialConnection.c)?

Yes, I did, but I cheated a bit. I added a 1ms delay in the critical section to enlarge the problem. This resulted in a lot of stuttering, applying your fix removed the stuttering completely. But I struggled to show the same thing without the delay. It's basically a matter of chance, if the interrupt comes at a bad time then you have a problem. I would suggest to go ahead, the fix is definitely an improvement. Completely removing rx_ok is probably best.

Here a function that might be useful:

uint16_t Serial_Available(SERIAL_PORT_INDEX portIndex)
{
  if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1))
    return 0;
  return (dmaL1Data[portIndex].cacheSize + dmaL1Data[portIndex].wIndex - dmaL1Data[portIndex].rIndex) % dmaL1Data[portIndex].cacheSize;
}

BTW: I'm currently testing an interrupt based serial read approach (no DMA) for the STM32F2/4.

digant73 commented 1 year ago

but did you apply the change I suggested (at the beginning of function Serial_Get in SerialConnection.c)?

Yes, I did, but I cheated a bit. I added a 1ms delay in the critical section to enlarge the problem. This resulted in a lot of stuttering, applying your fix removed the stuttering completely. But I struggled to show the same thing without the delay. It's basically a matter of chance, if the interrupt comes at a bad time then you have a problem. I would suggest to go ahead, the fix is definitely an improvement. Completely removing rx_ok is probably best.

Here a function that might be useful:
uint16_t Serial_Available(SERIAL_PORT_INDEX portIndex)
{
  if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1))
    return 0;
  return (dmaL1Data[portIndex].cacheSize + dmaL1Data[portIndex].wIndex - dmaL1Data[portIndex].rIndex) % dmaL1Data[portIndex].cacheSize;
}
BTW: I'm currently testing an interrupt based serial read approach (no DMA) for the STM32F2/4.

The interrupt handler uses the L1 cache as a circular buffer and nothing is lost or duplicated even if the incoming data are fragmented. The DMA solution should be a lot better than a solution without DMA. I didn't understand the bad time you think can happen on interrupt handler. In theory, with the fix, interrupt handler and the Serial_Get function should work properly (they are independent and Serial_Get will return a full message once it is found in the L1 cache).

Anyway, I'm making some changes on Serial_Get function (wIndex statically handled as rIndex. It is currently dynamically handled (if interrupt handler changes wIndex that value is also read by Serical_Get).

rondlh commented 1 year ago

The interrupt handler uses the L1 cache as a circular buffer and nothing is lost or duplicated even if the incoming data are fragmented. The DMA solution should be a lot better than a solution without DMA. I didn't understand the bad time you think can happen on interrupt handler. In theory, with the fix, interrupt handler and the Serial_Get function should work properly (they are independent and Serial_Get will return a full message once it is found in the L1 cache).

Anyway, I'm making some changes on Serial_Get function (wIndex statically handled as rIndex. It is currently dynamically handled (if interrupt handler changes wIndex that value is also read by Serical_Get).

Theoretically I agree, but practice might be different. I'm not an STM32 programmer and could interpret things incorrectly, I'm still learning. If DMA works then that is probably the best way, but it also has disadvantages and there is not so much data to receive. Anyway, I checked for DMA errors, which are reported at 250Kbaud, but interrupts per byte seems stable at that speed (still testing). Your input is very appreciated. I would be happy to have a look at your code changes...

digant73 commented 1 year ago

The interrupt handler uses the L1 cache as a circular buffer and nothing is lost or duplicated even if the incoming data are fragmented. The DMA solution should be a lot better than a solution without DMA. I didn't understand the bad time you think can happen on interrupt handler. In theory, with the fix, interrupt handler and the Serial_Get function should work properly (they are independent and Serial_Get will return a full message once it is found in the L1 cache). Anyway, I'm making some changes on Serial_Get function (wIndex statically handled as rIndex. It is currently dynamically handled (if interrupt handler changes wIndex that value is also read by Serical_Get).

Theoretically I agree, but practice might be different. I'm not an STM32 programmer and could interpret things incorrectly, I'm still learning. If DMA works then that is probably the best way, but it also has disadvantages and there is not so much data to receive. Anyway, I checked for DMA errors, which are reported at 250Kbaud, but interrupts per byte seems stable at that speed (still testing). In DMA mode I can see a temp reports starts, but is interrupted by the end part of an "ok P31 B6" messages. Then the temp report and a bunch of "ok" messages are copied to memory again (I compare the temp readings to verify it is copied again).. Either way, this is not likely to cause hesitations, because too many "ok" messages are received. Your input is very appreciated. I would be happy to have a look at your code changes...

Yes no problem to share the code. Try this draft implementation for Serial_Get function (I didn't have time to test it at all; just repaired now a board for a beloved plasma TV):

uint16_t Serial_Get(SERIAL_PORT_INDEX portIndex, char * buf, uint16_t bufSize)
{
  // if port index is out of range or no data to read from L1 cache
  if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1) || dmaL1Data[portIndex].rIndex == dmaL1Data[portIndex].wIndex)
    return 0;

  DMA_CIRCULAR_BUFFER * dmaL1Data_ptr = &dmaL1Data[portIndex];

  // make a static access to dynamically changed (by L1 cache's interrupt handler) variables/attributes
  uint16_t wIndex = dmaL1Data_ptr->wIndex;

  // L1 cache's reading index (not dynamically changed (by L1 cache's interrupt handler) variables/attributes)
  uint16_t rIndex = dmaL1Data_ptr->rIndex;

  while (dmaL1Data_ptr->cache[rIndex] == ' ' && rIndex != wIndex)  // remove leading empty space, if any
  {
    rIndex = (rIndex + 1) % dmaL1Data_ptr->cacheSize;
  }

  for (uint16_t i = 0; i < (bufSize - 1) && rIndex != wIndex; )  // retrieve data until buf is full or L1 cache is empty
  {
    buf[i] = dmaL1Data_ptr->cache[rIndex];
    rIndex = (rIndex + 1) % dmaL1Data_ptr->cacheSize;

    if (buf[i++] == '\n')  // if data end marker is found
    {
      buf[i] = '\0';                   // end character
      dmaL1Data_ptr->rIndex = rIndex;  // update queue's index

      if (rIndex == dmaL1Data_ptr->wIndex)  // if L1 cache is empty, mark the port as containing no more data
      {
        // NOTE: if here, there is a small chance the L1 cache's interrupt handler could be invoked (and triggering L1 cache as
        //       available) in the meanwhile just before the following code (infoHost.rx_ok[portIndex] set to "false") is executed.
        //       However, infoHost.rx_ok[portIndex] is not used as condition in this function so there no side effect on message handling
        //
        infoHost.rx_ok[portIndex] = false;
      }

      return i;  // return the number of bytes stored in buf
    }
  }

  // if here, a partial message is present on the L1 cache (message not terminated by "\n").
  // We temporary skip the message until it is fully received

  return 0;  // return the number of bytes stored in buf
}

rondlh commented 1 year ago

I tried that already, removed these funny wIndex pointers and played with the guard conditions (I used Serial_Available)... it didn't help (still corrupted data received). I just had a look at how Marlin handles this, it also does interrupt based serial IO, with some error checking. The TFT seems to not use any TX buffers, I guess that will slow down the system quite significantly. The TFT sends a lot of data, and receives a little only.

rondlh commented 1 year ago

It seems the situation is like this, sometimes the serial communication has issues like frame errors (FE) or parity errors (PE). This happens more frequently at higher bitrates. In my tests I see that these errors become very rare at 250K baud, but they occur occasionally (every setup is different of course). DMA on top of this could have additional errors like bus conflicts (not sure if this happens or not), but I can see the DMA reports errors (not sure why, FE, PE or others?). It seems that if DMA runs into an errors it transmits some data again (at least that is what I see happen sometimes). So the best strategy is to use low bitrates (<250K). If I compare DMA vs interrupt based serial reading: DMA - Setup once and it works using few resources, but it only reports the reception of the data once the serial like goes to idle. (removing the rx_ok condition helps!). This is not only a delayed response if a lot of data is being received, but can also cause buffer overruns (M503/M43 response). And perhaps the worst problem, errors during DMA seem to cause data duplication. Interrupt based reading is more resource intensive (actually not much data is processed, <1kB per second). It has the advantage that you immediately can respond when a command is complete and the data can get processed, which helps in preventing buffer overruns. Using rx_ok is wrong in either case, but you could use a complete message counter, and compare this (not change it!) when processing the incoming data.

Using low baud rates (<250K) seems to be the solution, which is in conflict with the current TFT approach of waiting for the response to the previous command before sending the next, but implementing ADVANCED_OK(https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/issues/2819) takes away all time critical issues and solves this whole topic (unless there still is a bug we overlooked).

BTW: Variables that are accessed in interrupt routines should be declared "volatile"

rondlh commented 1 year ago

I found DMA works fine, if you don't guard Serial_Get with rx_ok, and so does the interrupt based approach. Noise and error levels seem similar. The STM32F2_4 DMA interrupt routine can be simplified to:

void USART_IRQHandler(uint8_t port)
{
  if ((Serial[port].uart->SR & (1<<4)) != 0)
  {
//    Serial[port].uart->SR; // Not needed
    Serial[port].uart->DR; // Clear interrupt flag
    dmaL1Data[port].wIndex = dmaL1Data[port].cacheSize - Serial[port].dma_stream->NDTR;
  }
}

The other interrupt routines can follow the same strategy.

@digant73 I think your fix (removing rx_ok from Serial_Get guard) fixes the stuttering completely. I don't think there are any other issues. DMA works fine, my debugger gave me some bad/outdated data, there is no data duplication happening.

digant73 commented 1 year ago

@rondlh thanks for your deep testing and analysis while I was all the day on a beach yesterday :-). If USART_IRQHandler was called for every read byte we could even replace infoHost.rx_ok[port] with a counter (incremented each time "\n" is found). The counter will be used also in Serial_Get as initial condition just to avoid a useless partial message read (if any) and decremented once a full message is retrieved. But, if I understood you analysis, it seems USART_IRQHandler is called when there is no byte to read on serial, so the counter cannot be used. In that case (partial message reading), in Serial_Get I could update dmaL1Data_ptr->rIndex and use a variable to keep track of buffered bytes in buf so on next Serial_Get invokation we could copy only the new bytes, if any, received from serial.

V1EngineeringInc commented 1 year ago

Sounds like we are close to a solution? Sorry I am not that well versed in this stuff, so I am following along fingers crossed.

rondlh commented 1 year ago

I have the following conclusions and recommendations:

1. Apply @digant73 fix to Serial_Get (SerialConnection.c), I'm quite certain it was the cause of the hesitations. In my view rx_ok is not needed at all, especially not in an interrupt routine, so better just look at rIndex and wIndex or use Serial_Available. if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1) || !infoHost.rx_ok[portIndex]) replace with if (!WITHIN(portIndex, PORT_1, SERIAL_PORT_COUNT - 1)) @V1EngineeringInc Here is an easy quickfix that will work. Give it a try.

2. Leave DMA as is, it's fine. Serial_Get is scanning the incoming data very fast anyway, so there is no need to make a direct connection to the interrupt routine.

3. Add ADVANCED_OK, a working implementation can be found here (https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/issues/2819)). This will take away all the time critical requirements of the printing process, even if you don't have ADVANCED_OK enabled in Marlin. This makes sure the printer always has some work at hand, it doesn't matter anymore if the TFT is a few milliseconds behind. This can drastically increase the command throughput. The only disadvantage I see is that the response to TFT input during printing will be a bit slower because the printer first will handle the commands in the queue. The delay depends on what the queue commands actually are. If you enable ADVANCED_OK in Marlin make sure to also increase your RX_BUFFER_SIZE in Configuration_Adv.h to at least 256 bytes (512 or 1024 bytes is better is your hardware supports it). This is needed because with ADVANCED_OK the TFT will send more work to the printer at once.

4. Don't use baud rates faster than 250K baud, if you see problems then go lower. With ADVANCED_OK enabled the baud rate becomes almost irrelevant because the printer has already several commands buffered, and doesn't depend on a quick response from the TFT anymore. In the current implementation (wait for response before sending the next command) the printer is very prone to micro stuttering, which you might not even notice, but it will degrade the print quality.

rondlh commented 1 year ago

@rondlh thanks for your deep testing and analysis while I was all the day on a beach yesterday :-). If USART_IRQHandler was called for every read byte we could even replace infoHost.rx_ok[port] with a counter (incremented each time "\n" is found). The counter will be used also in Serial_Get as initial condition just to avoid a useless partial message read (if any) and decremented once a full message is retrieved. But, if I understood you analysis, it seems USART_IRQHandler is called when there is no byte to read on serial, so the counter cannot be used. In that case (partial message reading), in Serial_Get I could update dmaL1Data_ptr->rIndex and use a variable to keep track of buffered bytes in buf so on next Serial_Get invokation we could copy only the new bytes, if any, received from serial.

This issue has been bothering me for a long time, so I thought let's look into it. About the code change, sure that could be done like you said, but it makes things more complex, and there is more work in the interrupt routine, and I don't think it helps much, if at all. Note that in most cases the data processed is just a simple "ok\n". So only in very rare cases Serial_Get will do unneeded work that could have been prevented with a guard. The remaining problem with DMA is for long messages (M43), but there are ways around it also. I'm not sure if the DMA process can also work with an RXNE interrupt, then you have the best of both world, an automated process and the wIndex pointer is updated immediately when data is available (not only after the command/line is complete. I can share the interrupt based approach I used for testing if you are interested (STM32F2_4 only).

I think it would be more useful to spend time on also sending data via a buffer and DMA, because 15 times more data is send than is received. I guess sending data is currently the main bottleneck in the TFT processing loop, especially when using low baudrates. The TFT is just waiting for every byte that is send. If this is automated with DMA then Serial_Get will be executed at a much higher frequency.

BTW: The for loop in Serial_Get is really a piece of ...."art". I can see how it works, but why would anybody ever do this! Must have been a Z80 assembly programmer trying to save a clock cycle :D

rondlh commented 1 year ago

@digant73 I will check the bugfix soon and let you know, 44 files changed. OMG :D Note that ADVANCED_OK in the TFT also works fine if you have it disabled in Marlin/Reprap. The printer command buffers are still there, but you are not informed about how many are available with the "ok" response. In that case you have to do the buffer count administration yourself starting with the default number. So previously the TFT would only counts to 1 command and wait, now it counts to 4 commands and waits. It doesn't depend on the

I just tried DMA with RXNE interrupt, documentation seems to indicate that it should be possible, but didn't get it to work, then I thought of another approach to achieve the same thing. How about updating wIndex in Serial_Get instead of the serial interrupt routine. Then you would have the latest buffer status right when you need it and don't depend on the serial line going to idle. That worked the first time, and basically takes away the key disadvantage of DMA.

digant73 commented 1 year ago

@digant73 I will check the bugfix soon and let you know, 44 files changed. OMG :D I just tried DMA with RXNE interrupt, documentation seems to indicate that it should be possible, but didn't get it to work, then I though of another approach to achieve the same thing. How about updating wIndex in Serial_Get instead of the serial interrupt routine. Then you would have the latest buffer status right when you need it and don't depend on the serial line going to idle. That worked the first time, and basically takes away the key disadvantage of DMA.

sure the code can be optimized. I didn't have time to test the PR (possibly it will freeze for the changes made for Advanced OK). I would suggest people wanting to test the PR to start testing the bug on print hesitation (so leaving Advanced OK disabled at the moment).

rondlh commented 1 year ago

Testing with ADVANCED_OK disabled (default settings). All seems to work fine, great job! I have some feedback on the comments about ADVANCED_OK

Advanced OK
Used/effective only in case "ADVANCED_OK" is also enabled in Configuration_adv.h in Marlin firmware.
If enabled, the TFT will support the ADVANCED_OK feature eventually enabled in Marlin firmware.
It means that the TFT will use the available TX G-code slots indication provided by the mainboard
to schedule the transmission of multiple G-codes, if any, for a maximum of the given indication.
If disabled, the TFT will support, independently from the ADVANCED_OK feature configured in Marlin
firmware, the transmission of a single G-code per time waiting for the related ACK message from
the mainboard before proceeding with the next G-code transmission, if any.
NOTES:
- Enable it in case ADVANCED_OK feature is enabled in Marlin firmware.
- Disable it (preferably) in case ADVANCED_OK feature is disabled in Marlin firmware or
  if ADVANCED_OK feature is not requested/needed by the user.
  Options: [disable: 0, enable: 1]

ADVANCED_OK can work fine, even without enabling it in Marlin/Reprap, then 4 command buffers can be assumed (Marlin's default). One key requirement is that there are enough RX_BUFFER_SIZE defined in Marlin. To be safe you need (MAX_CMD_SIZE BUFSIZE) RX buffers. By default this is 96 4 bytes so you would need to at least set RX_BUFFER_SIZE to 512 bytes, practically half of that will be enough, but more is better/safer. All these parameters are defined in Configuration_adv.h. Even if you just assume 2 buffers when ADVANCED_OK is not enabled in Marlin, it will be a big improvement. I recommend to add this info so people can make informed choices.

bigtreetech / BIGTREETECH-TouchScreenFirmware

[Q] Random pauses or hesitation when printing. #2761