MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.24k stars 19.22k forks source link

[BUG] TX data loss #26507

Closed thephantom1492 closed 4 months ago

thephantom1492 commented 10 months ago

Did you test the latest bugfix-2.1.x code?

Yes, and the problem still exists.

Bug Description

When printing at a higher speed with highish linear advance (k1.57, direct drive) and high retraction speed and print speed, there is some data loss from the printer to the host.

Some examples (only the transmitted data to the host is here)

ok N925 P0 B3
ok N926 P0 B3
ok N927 ok N923 P1 B3
ok N924 P0 B3
ok N925 P0 B3

ok N1027 P1 B3
ok N1028 P0 B3
ok N10 N1025 P0 B3
ok N1026 P0 B3
ok N1027 P1 B3

ok N1098 P0 B3
ok N1099 P0 B3
ok N11001096 P0 B3
ok N1097 P0 B3
ok N1098 P0 B3

This occur on ALL ports. USB and serial, at the same time.

This happen when the host is connected via USB and also via serial.

I tried 2 different usb cables, btt provided blue A-B cable, and another one from amazon usb C-B cable, therefore two different ports on the raspberry pi 4b.

I also tried a serial cable, from the pi PL011 port to the btt skr2. Same result. So 3 different connections total.

I tried Marlin from may, and from december 5, no difference.

Octoprint got upgraded to an experimental 64 bits nightly version, also no difference.

Test setup: Raspberry Pi 4B 8G BTT SKR2 V2 USB C-B cable from the pi to the skr2 usb-serial listening to the serial port on the skr2, logging, connected to my linux server octoprint on octopi 64 bits

Prusaslicer 2.6.1

; Filament gcode
M306 H0.0056 ; generic heat capacity of PLA
M207 F3600 S0.0 Z0 ; firmware retract settings, no retraction
M900 K1.57 ; linear advance
M205 E60 ; E jerk
M203 R10000 ; retract acceleration
M207 F3600 ; retract speed

Print speed is basically set to 150mm/s with acceleration at 3000mm/s², flow limited to 7mm³/s.

I have attached the Configuration.h and adv.h, also my sliced benchy.

I have some test equipments. If you need something else, let me know what you need and the procedure to obtain the data.

Bug Timeline

Same issue since atleast GIT of may

Expected behavior

Expected a perfect print

Actual behavior

It pause, causing dimples and other print defects, and sometime literally fail (printer kill).

Steps to Reproduce

  1. print the benchy with the settings for the TPU90
  2. Notice some print quality
  3. Notice that the printer pause
  4. Notice that the terminal output of Octoprint show some data loss and retransmit from the printer (and no retransmission from octoprint to the printer)

Version of Marlin Firmware

GIT as of 2023-12-05

Printer model

Heavy modified Ender 5 Plus

Electronics

BTT SKR2

LCD/Controller

BTT TFT70

Other add-ons

Raspberry Pi 4b with octopi

Bed Leveling

UBL Bilinear mesh

Your Slicer

Prusa Slicer

Host Software

OctoPrint

Don't forget to include

Additional information & file uploads

config.zip

thephantom1492 commented 10 months ago

I forgot to say that I also tried different values for "#define TX_BUFFER_SIZE 0". 0 32 128 256. All same.

cbagwell commented 10 months ago

I saw similar lost TX bytes on my under powered SKR Mini E3 V3 and at similar print speeds/accel. I've taken a break from debugging it but here is where I left off.

The TX_BUFFER_SIZE does not seem to apply to STM32 USBSerial driver and there doesn't seem to be an alternative to set to buffer more.

If I'm understanding the code correctly, the code that invokes USBSerial::write() isn't ever expecting it to return error and it will not retransmit data/buffer will be dropped.

This seems like it can happen pretty easy if an ISR takes more than 3ms to run based on logic in stm32/usb/cdc/usbd_cdc_if.c:

/*
 * The value USB_CDC_TRANSMIT_TIMEOUT is defined in terms of HAL_GetTick() units.
 * Typically it is 1ms value. The timeout determines when we would consider the
 * host "too slow" and threat the USB CDC port as disconnected.
 */
#ifndef USB_CDC_TRANSMIT_TIMEOUT
  #define USB_CDC_TRANSMIT_TIMEOUT 3
#endif

The write() sits in a retry loop for that long and if the ISR fires at an unfortunate time within USBSerial::write(), it will think the write itself is what took so long when its working fine.

So defining that to a much higher value should be an easy test to see it prevents dropping TX data.

thephantom1492 commented 10 months ago

So defining that to a much higher value should be an easy test to see it prevents dropping TX data.

Except that it also occur on standard serial, not just USB

cbagwell commented 10 months ago

Sorry, I missed the part that its an issue on standard serial as well. For that case, all I have to offer is to double check your board's ini/stmXX.ini file and see if it has a SERIAL_TX_BUFFER_SIZE defined. If so then increase that value instead of TX_BUFFER_SIZE in the header file as it has higher priority.

thephantom1492 commented 10 months ago

and see if it has a SERIAL_TX_BUFFER_SIZE defined.

see

I also tried different values for "#define TX_BUFFER_SIZE 0". 0 32 128 256. All same.

github-actions[bot] commented 4 months ago

Greetings from the Marlin AutoBot! This issue has had no activity for the last 90 days. Do you still see this issue with the latest bugfix-2.1.x code? Please add a reply within 14 days or this issue will be automatically closed. To keep a confirmed issue open we can also add a "Bug: Confirmed" tag.

Disclaimer: This is an open community project with lots of activity and limited resources. The main project contributors will do a bug sweep ahead of the next release, but any skilled member of the community may jump in at any time to fix this issue. That can take a while depending on our busy lives so please be patient, and take advantage of other resources such as the MarlinFirmware Discord to help solve the issue.

github-actions[bot] commented 2 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.