MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.16k stars 19.21k forks source link

[BUG] Fire hazard from SD Card fault #22585

Closed tdcox closed 2 years ago

tdcox commented 3 years ago

Did you test the latest bugfix-2.0.x code?

No, but I will test it now!

Bug Description

JGMaker Artist-D Pro with firmware release v2.3.3

A corrupted SD card caused a print to fail part way through. The failure manifested as a halt to printing at a specific and repeatable layer. The nozzle was driven down into the part and remained heating. More worryingly, the bed continued to heat up, without any thermal control and when I discovered the problem, the metal chassis of the machine was hot. This could easily have led to a fire had the temperature gone much higher.

When I powered off and on the machine, it only then detected a problem and set off an alarm, refusing to boot until the temperature was back within the safe range.

The UI was still responsive during the fault, so this is a failure to monitor safety parameters during an unexpected print buffer under-flow rather than a system hang.

Bug Timeline

New machine. Fault occurred on prints #3 and #4 with faulty SD card.

Expected behavior

I expected safety protocols to be functional at all times.

Actual behavior

Over temperature checks were not being carried out and no alarm was raised.

Steps to Reproduce

The fault occurs when the printer is reading a file that is corrupted on SD card. Upon subsequent investigation, Windows is unable to read the file at all, reporting it as corrupted, so this looks like a combination of two bugs, firstly a failure to validate the print file before starting a print and then secondly a failure of the temperature monitoring routine in the face of an unexpected empty buffer.

The print failure mode is repeatable using the same card and file, but the thermal state is sensitive to initial conditions, so it may fail with the heaters on or off.

Version of Marlin Firmware

V2.3.3

Printer model

JGMaker Artist-D Pro

Electronics

Stock

Add-ons

None

Bed Leveling

No response

Your Slicer

IdeaMaker

Host Software

SD Card (headless)

Additional information & file uploads

No response

ellensp commented 3 years ago

That is not a valid version of Marlin. reall official Marlin is only u to 2.0.9.1 It is probably JGMaker specific version, and as such we cant help

ellensp commented 3 years ago

Here it is https://github.com/JGMaker3dofficial/ArtistD-Pro This is based on really old marlin 2.0.5 You need to submit this report on JGMaker's github. As we cannot help with 3rd party forks of Marlin.

tdcox commented 3 years ago

With respect, you still have an active issue with an undiagnosed thermal runaway problem in the current release and the buffer underrun condition may provide pertinent information to help pinpoint that problem.

In Open Source communities, it is important not to discourage people from reporting safety-related issues, even on older releases, since these are likely to be versions that are prevalent in the community and which represent something that we need to react to and mitigate as responsible software maintainers.

A year old, is not ‘really old’ from the perspective of a piece of firmware in commercial equipment. I regularly deal with safety-related systems that are expected to have a lifespan of a decade or more so washing your hands of something that was only written a year ago reflects badly upon the maturity of the Open Source community.

cp2004 commented 3 years ago

With respect as well, you are asking the open source community to fix an issue with firmware that came with your printer which is not in this repository. You've been told where to send your bug report, this community is not responsible for supporting commercial printers' firmware and only maintains the code that is contained within this repository.

Feel free to use this repository to rebuild your own version of Marlin, and come back with a new report based on the current code and not a 3rd party fork.

tdcox commented 3 years ago

No, what I am saying is that as Open Source maintainers, the onus is on us to act responsibly when it comes to reports of safety-related issues and not dismiss them as ‘not our problem’ for spurious reasons. At least, there should be a desire to perform a diff on the JGMaker branch against the point it forked to see if they have modified anything relating to card reading or temperature safety monitoring. If not, you would obviously want to inspect the commit history of your current version to see if it was still relying upon the same code and therefore at risk of similar failures.

If a safety incident happens, it doesn’t matter if it occurs in the main project, or in a fork, because in the public perception, it is your brand that is impacted negatively, so we should always care about potential issues in this space. More importantly, however, we represent Open Source as a whole and if we are seen to be passing the buck on safety issues, that reflects badly upon all Open Source projects from the perspective of regulators.

Some possible better responses might be to reach out to the fork maintainer to offer to help them identify a safety risk, or to ask the submitter of the issue if they are able to build a rebased version to test to see if the issue is still live in main. Personally, I would also want to obtain a copy of the corrupted file, so that it could be included in automated testing for future releases to mitigate this problem ever occurring in my project subsequently.

It is probably not a good idea to contact fork maintainers and tell them not to report bugs upstream because that is bad for all of us. We need to know about problems before we can fix them, so we should always encourage feedback. Somebody may be exercising something in a fork that you haven’t got test coverage for in your project.

thinkyhead commented 3 years ago

Please test the bugfix-2.0.x branch to see where it stands. If the problem has been resolved then we can close this issue. If the issue isn't resolved yet, then we should investigate further.

WillisJM commented 3 years ago

That folder has several files in it. What am I testing inside the folder and what test are you asking me to do?

Thank you, Jamaal

On Tue, Aug 17, 2021, 2:42 PM Scott Lahteine @.***> wrote:

Please test the bugfix-2.0.x branch https://github.com/MarlinFirmware/Marlin/archive/bugfix-2.0.x.zip to see where it stands. If the problem has been resolved then we can close this issue. If the issue isn't resolved yet, then we should investigate further.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MarlinFirmware/Marlin/issues/22585#issuecomment-900541963, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUVYZHSNPANTP6T5F2GGPGLT5KUPVANCNFSM5CJPUO5A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Tannoo commented 3 years ago

What he is asking is to download the latest bugfix branch code, extract it, configure it, burn it, and run it. Test THAT code against the crash scenario that was played out before.

I am running bugfix-2.0.x (07-10-2021).

I cannot reproduce this issue. I also do not have a corrupt SD card. Thermal runaway has been working just fine for me.

This was also stress tested during my SKR 1.4 upgrade and all of the bad over-crimps (tighter is not always better).

Tannoo commented 3 years ago

The Marlin maintainers cannot control and also are not responsible for what is done with any modification of downloaded code, commercial or otherwise.

And being that the commercial entity has given it's own version numbers, they have modified the code outside of this Marlin repo. So, it is that entity's responsibility to handle the issue of that variation of Marlin. The official Marlin repo maintainers cannot possibly know everything that has been done to that code. This was stated before and not be rude, but to be honest.

If a safety incident happens, it doesn’t matter if it occurs in the main project, or in a fork, because in the public perception, it is your brand that is impacted negatively, so we should always care about potential issues in this space.

So, if I modify the latest Chevy Silverado to run a drag race in the low 9's and my engine runs away potentially blowing up, is Chevrolet supposed to correct the runaway? I think not. Same story.

Tannoo commented 3 years ago

When looking over the release from JGMaker, it can easily be upgraded to the latest Marlin. Backup your firmware copy.

Replace all of the Marlin files from the Marlin folder from the bugfix branch.

Then update the config files with the correct values and options.

Upload to the printer and it should be good to go.

The bugfix branch has all the latest fixes to all kinds of things. Sometimes, an unknown issue get's corrected from something seemingly unrelated.

Tannoo commented 3 years ago

Once you do this upgrade, JGMaker may not help with software issues.

thinkyhead commented 3 years ago

I cannot reproduce this issue. I also do not have a corrupt SD card.

Yanking the card in the middle of a print is now handled, so that should be a safe scenario at this time. Maybe I can add something under MARLIN_DEV_MODE that allows simulating a corrupted SD card, just to see what transpires next.

Tannoo commented 3 years ago

Yanking the card in the middle of a print is now handled, so that should be a safe scenario at this time. Maybe I can add something under MARLIN_DEV_MODE that allows simulating a corrupted SD card, just to see what transpires next.

Randomly inject a string of garbage ASCII characters? lol

Tannoo commented 3 years ago

Yanking the SD card wouldn't quite be the same as a corrupt one as the SD detect is "un-triggered". That would activate a routine rather quickly, right?

Injecting garbage in the print buffer or completly stop the data flow to the buffer would be more like it.

Sebazzz commented 3 years ago

You should have the watchdog (USE_WATCHDOG) enabled - I have seen similar symptoms on Creality v4.5.x boards, where reading the SD card through SDIO can cause hangs. With the watchdog this results in a reboot obviously, without watchdog it can indeed result in unlimited heating.

thinkyhead commented 2 years ago

Good point. I've added a build warning about the watchdog being disabled, along with warnings about the thermal runaway protection features being disabled.

github-actions[bot] commented 2 years ago

This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days.

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.