Closed judokan9 closed 8 years ago
I had similar issues. Using the RcBuxFix
branch helped.
Okay, thank you. I give it a try when im at home. Did you now what is different in this version ? Or what caused this problem ? Its pretty odd that it don´t homes normally after driven more than 550mm or so.
I hope this gives you the needed information #4454
Nope, i got the same results with the RcBuxFix branch. The printer Stops 100mm before it can reach the Endstops and hang on.
Here is my current config file:
Printer Messures are: Heigh ~760mm Bed-radius 100mm Extruder/s 1x
Another job for @MarlinFirmware/testers-delta-team
I hope they can solve my odd problem
@judokan9
If you disable USE_WATCHDOG
, Is problem solved?
sorry for my late answering, i was very busy today.
BUT i disabled #define USE_WATCHDOG
in the Configuration_adv.h
and it worked perfectly.
Will try either tomorrow night or Saturday.
@esenapaj Wow, you are my hero.
@AnHardt You're the WATCHDOG
man! What the heck is going on with the watchdog timing out in the middle of a print? We must be resetting it often enough, yes? Where is this falling down?
BUT i disabled #define WATCHDOG_RESET_MANUAL in the Configurations_adv.h and it worked perfectly.
WATCHDOG_RESET_MANUAL
only defines how to react on a watchdog reset - whether showing a kill screen and going into a endless loop, or making a hardware reset, where most boards do not come out of the bootloader but resetting again and again and ...
If USE_WATCHDOG
would be involved, i'd say it could be a problem with refreshing the watchdog timer, but with WATCHDOG_RESET_MANUAL
i guess the user is seeing a random result, caused by something else.
So what can give us the impression of a hanging machine but not causing a Watchdog Timer Overflow Reset (the user did not tell us about the typical symptoms of WTOR , a fast blinking LED, or the killscreen)?
A endless loop in a IRC in combination with WATCHDOG_RESET_MANUAL
could cause a hang not able to execute WTOR because cli
is set.
A extremely slow move? Like in the auto retract problem?
Something completely different. With the users config. he has 200steps/mm and a z-max of ~760mm. At some place the machine crosses the 128k steps border. (200*760mm=152000, 128k/200=655mm). That could be about matching to the errors description. Could some intermediate integer result have flipped the sign?
However. When the problem is away now, one of our patches may have helped. A relation to WATCHDOG_RESET_MANUAL
seems to be unlikely. The hang would show different symptoms but should not have disappeared.
Today i have printed several times. Every Time the printer homes normally but when i reenable USE_WATCHDOG
it stuck's and don't move slowly it stays on this point. The idea with overflowing variables sounds very plausible. I want to build an even bigger printer... Would this problem appear when i scale up the height ? How can i avoid this Problem ? Use Long_INT ?
Any Ideas why it worked with disabled USE_WATCHDOG
in RC7 and RCBugFix ?
Sorry for my wrong information, i din't disabled WATCHDOG_RESET_MANUAL
this was disabled from the firmware normally. I was in hurry... I disabled USE_WATCHDOG
...
I was in hurry... I disabled
USE_WATCHDOG
...
Without WATCHDOG_RESET_MANUAL
the watchdog gets reset in the following manner:
updateTemperaturesFromRawValues
resets the watchdog timer and clears temp_meas_ready
.manage_heater
calls updateTemperaturesFromRawValues
when temp_meas_ready
is set.manage_heater
is called from idle
, safe_delay
, and elsewhereFor the temp_meas_ready
flag to get set…
set_current_temp_raw
sets the temp_meas_ready
flag.Temperature::isr
calls set_current_temp_raw
when temp_count >= OVERSAMPLENR
(if temp_meas_ready
is unset)So basically, anything that blocks for too long (in our case, 4 seconds) without calling manage_heater
could trigger the watchdog timer. Either something is blocking for 4 seconds, or the watchdog timer is expiring too soon.
Just for understanding, you mean that my homing routine takes more then 4 secs and in this Time manage_heater
does not been called ? That sounds possible, but i think my homing routine does not need more then 3 seconds...
anyway....
What is when i have a big Printer, maybe a Delta with a very large build height and homing takes like 10 seconds. The only way to avoid the stuck after 4 seconds "BUG" is to change the value to 11 seconds or so ?
I might have a similar issue. My end script looks like this:
M104 S0 ; turn off extruder
M140 S0 ; turn off bed
G28 X0 ; home X axis
M84 ; disable motors
G4 S60 ; sleep 60 seconds to cool down
M81 ; power off
I am waiting 60 seconds to let the nozzle cool before i power off the power supply. And while it is waiting the 60 seconds, it looks like it is crashing. After every print, Octoprint detects a error in communication and disconnects. This could be due to the same reason.
What is when i have a big Printer, maybe a Delta with a very large build height and homing takes like 10 seconds. The only way to avoid the stuck after 4 seconds "BUG" is to change the value to 11 seconds or so ?
If you have a very large printer, probably other things will have to change. It is possible your extruder will be larger too. So a larger time out on thermal will make sense.
my homing routine takes more then 4 secs and in this Time manage_heater does not been called?
The idle()
function is called frequently during processes like waiting for the nozzle to cool, during G4
, or doing G29
, and as long as the main loop is running. For this 4 second timer to expire, something would have to go very wrong, a crash or infinite loop preventing the timer being reset. According to Arduino documentation this timer may slow down if the voltage is low. Perhaps it speeds up if it gets a surge of higher current also.
Keep an eye out for a period of 4 seconds when the machine is unresponsive before it actually does a watchdog reset.
@thinkyhead Made some debug code to find out what Marlin is doing all the time. Please have a look at: "Add debug counters https://github.com/AnHardt/Marlin/pull/64" Do you think this could be helpful with this kind of problems? :-)
When a timeout happens, it would be interesting to see the stack frame. If we saved the top 100 bytes of the stack in EEPROM, it would be possible to know EXACTLY what led to the failure. It would be slightly more tricky to accomplish, but it could be saved in RAM also because the RAM is not cleared when the processor is reset.
@AnHardt This is my case. After freezing, LCD is filled with squares.
#define DELTA_SEGMENTS_PER_SECOND 180
#define XYZ_FULL_STEPS_PER_ROTATION 400
#define XYZ_MICROSTEPS 32
A branch that it was used for test: https://github.com/esenapaj/Marlin/tree/testes2
@esenapaj YES my printer does exactly the same thing. But i din't have an LCD Display. After i disabled #define USE_WATCHDOG
in the Configuration_adv.h
the Printer homes normally.
Definitely not a watchdog problem. The processor is running much longer then 4seconds until the display begins to fill. Looks more like a memory overflow. idle() is not called any more after:
#if ENABLED(DELTA)
/**
* A delta can only safely home all axes at the same time
*/
// Pretend the current position is 0,0,0
// This is like quick_home_xy() but for 3 towers.
current_position[X_AXIS] = current_position[Y_AXIS] = current_position[Z_AXIS] = 0.0;
sync_plan_position(); /////////////////// this is the last we can see.
// Move all carriages up together until the first endstop is hit.
current_position[X_AXIS] = current_position[Y_AXIS] = current_position[Z_AXIS] = 3.0 * (Z_MAX_LENGTH);
feedrate_mm_s = 1.732 * homing_feedrate_mm_s[X_AXIS];
+ SERIAL_ECHOLNPGM("before move");
line_to_current_position(); // if not already at the top this move should last long enough to
+ SERIAL_ECHOLNPGM("behind move");
stepper.synchronize(); // idle here
+ SERIAL_ECHOLNPGM("behind sync");
endstops.hit_on_purpose(); // clear endstop hit flags
current_position[X_AXIS] = current_position[Y_AXIS] = current_position[Z_AXIS] = 0.0;
// take care of back off and rehome. Now one carriage is at the top.
HOMEAXIS(X);
HOMEAXIS(Y);
HOMEAXIS(Z);
SYNC_PLAN_POSITION_KINEMATIC();
#if ENABLED(DEBUG_LEVELING_FEATURE)
if (DEBUGGING(LEVELING)) DEBUG_POS("(DELTA)", current_position);
#endif
Additionally change to
Maybe we can place a
SERIAL_ECHO_START;
SERIAL_ECHOPGM(MSG_FREE_MEMORY);
SERIAL_ECHOLN(freeMemory());
somewhere in the #if ENABLED(DEBUG_IDLE_COUNTER)
block, to see how much RAM is remaining.
To get a nice kill-screen if the watchdog reset is triggered i suggest to activate WATCHDOG_RESET_MANUAL
- at lest for this tests.
@Blue-Marlin Turning on the M100 Free Memory Watcher will help you know how close you are to running out of memory. But you have to be able to give Marlin an M100 command to get the information from it.
Ok
now the Delta sticks again.... With disabled watchdog... About 2 days in the past i configured the printer height to keep the distance between the nozzle and print bed like 0,15 mm. Yesterday evening i have to change the nozzle and correct the height downwards like ~2 mm again. Now the Printer doesn't home when im on Z 754.42.
Combined:
The printer homes when #define MANUAL_Z_HOME_POS
is 753.67
AND
The printer stuck's when #define MANUAL_Z_HOME_POS
is 754.42
I think a memory overflow is probably the problem and i have to agree my previous writers. But how can i prevent this ?
I think a memory overflow is probably the problem and i have to agree my previous writers. But how can i prevent this ?
First, lets get some data. Let's see how much stack and heap space is there. Please turn on:
#define M100_FREE_MEMORY_WATCHER // uncomment to add the M100 Free Memory Watcher for debug purpose
Flash the new firmware and bring up Marlin. Then give Marlin a: M100 I command to initialize the memory watcher. Then do a M100 F to see how much free memory is available.
Then... start a print. Let it do a few layers. Pause the print. And do another M100 F we will know from this how tight memory is.
@Roxy-3D I will try i directly.
Here are the Results
Send: M100 I Recv: Initializing free memory block. Recv: Recv: Recv: bss_end : 4887 Recv: Stack Pointer : 8592 Recv: Recv: 3633 bytes of memory initialized. Recv: Recv: ok
After the first Layer:
Send: M100 F Recv: Found 3366 bytes free at 0x131F Recv: ok
Second layer nearly finished:
Send: N2076 M100 F*119 Recv: Found 3366 bytes free at 0x131F Recv: ok
After that i paused the print and send an G28. The Printer homes and keeps stuck before it reaches the endstops... after ~5-6 seconds Octoprint says "unkown communication error .... Too many consecutive timeouts, printer still connected and alive?"
Sorry the start is been cutted of the log because of autoscroll from octoprint.
At the end, the Memory is not the problem here or not ?
3366 bytes free after printing several layers leads me to think you are not out of memory. Something else is corrupting memory or causing the hang.
Yust for fun try:
// Move all carriages up together until the first endstop is hit.
- current_position[X_AXIS] = current_position[Y_AXIS] = current_position[Z_AXIS] = 3.0 * (Z_MAX_LENGTH );
+ current_position[X_AXIS] = current_position[Y_AXIS] = current_position[Z_AXIS] = 1.5 * (Z_MAX_LENGTH );
feedrate_mm_s = 1.732 * homing_feedrate_mm_s[X_AXIS];
If the height makes a difference, this should too.
I changed from 3.0 to 1.5 and the Result is identical.
Ok,
i have disabled #define USE_WATCHDOG
again and now i can home normally....
The M100
test cannot catch a buffer overflow. A buffer overflow occurs when we write accidentally into memory either because a buffer is too small, or what we're writing is too long. A buffer overflow can lead to stack corruption, crashing, anomalous behavior… It's an awful thing and often quite hard to find.
Also, since we don't use any dynamic allocation, the amount of free memory that M100
reports should be always the same as it was at boot up.
Anyway, with USE_WATCHDOG
being involved, I think possibly there might be something else going on! There's a small number of Arduino boards that don't support the 4 second timeout (only much shorter ones), but I doubt you have one of those.
I have a Rumba board. Go down with the time should fix the Problem ? From 4 seconds to 2 or so ?
@judokan9 The opposite. A shorter timeout will cause the watchdog to trigger more often, and 2 seconds is not one of the available options. A longer timeout would be better, but it's no guarantee. If you'd like to test an 8s timeout to see if it makes any difference, change the line…
- wdt_enable(WDTO_4S);
+ wdt_enable(WDTO_8S);
The M100 test cannot catch a buffer overflow.
Agreed. But they said 'Memory overflow' and not 'Buffer overflow'.
Also, since we don't use any dynamic allocation, the amount of free memory that M100 reports should be always the same as it was at boot up.
This isn't true. At boot up, the various GCode commands have not been invoked. Some of the GCode commands like G29 P5 will wind up the stack and you will see a different amount of 'free' memory after it is invoked. Running G29 a second time should not lower the free memory by any significant amount. (It is possible to lose a small amount of additional 'free' memory because you can't control when the interrupts fire and their stack usage.)
int abl2 = sq(auto_bed_leveling_grid_points);
double eqnAMatrix[abl2 * 3], // "A" matrix of the linear system of equations
eqnBVector[abl2], // "B" vector of Z points
mean = 0.0;
int8_t indexIntoAB[auto_bed_leveling_grid_points][auto_bed_leveling_grid_points];
#endif // !DELTA
@thinkyhead Changing from - wdt_enable(WDTO_4S);
+ wdt_enable(WDTO_8S);
fixed the Problem... But is this fix good ? When i understood it right, the timer is looking about the status of the printer every 4s set an higher value would detect problems etc. not so far or not ?
If the time is 4 or 8 seconds does not matter. The regular refresh is 5 times/second. You just will see the reset 4 seconds later, or not at all, if the problem does not last that long.
The watchdog reset is a symptom - not the reason.
But they said 'Memory overflow' and not 'Buffer overflow'.
@Roxy-3D None of the code does any dynamic allocation, so I presume he was simply using the imprecise language of a layperson because there's no such thing as a "memory overflow."
the timer is looking about the status of the printer every 4s
@judokan9 No. How it works is, if we fail to reset the watchdog timer within 4 seconds, the board reboots. Increasing it to 8 seconds simply gives more leeway.
The watchdog reset is a symptom - not the reason.
@Blue-Marlin And yet changing it has given us new information. Something is delaying the watchdog reset by some amount that is bad for the given board. It may also be that the timer on the board is running too fast, losing bits, or getting hit with static. The Arduino documentation on the watchdog timer indicates it can run slower if the current is low, and I speculate that perhaps it can run too fast if it gets too much current.
I tried to test with the WATCHDOG_RESET_MANUAL
, but I'm seeing a strange result.
When I enable the WATCHDOG_RESET_MANUAL
and REPRAP_DISCOUNT_SMART_CONTROLLER
and upload a sketch,
MEGA2560 + RAMPS freeze immediately at every startup, and red LED on RAMPS flash, no response.
When I only enable the WATCHDOG_RESET_MANUAL
, LED doesn't flash, freeze, but can get response.
This freeze happens wether RAMPS is connected to MEGA2560 or not. So I guess that my MEGA2560 is almost broken. Thus I've ordered new MEGA2560 + RAMPS...
But why, when I disabled WATCHDOG_RESET_MANUAL
(but USE_WATCHDOG
is still enabled) it looks like that Marlin is booted normally.
Strange...
Hardware with one leg in the coffin can bring you strange and random results... Same can cheap knockoff's do :-D
Let us know if new hardware changes anything
Something completely different. With the users config. he has 200steps/mm and a z-max of ~760mm. At some place the machine crosses the 128k steps border. (200*760mm=152000, 128k/200=655mm). That could be about matching to the errors description. Could some intermediate integer result have flipped the sign?
I think AnHardt was on to something here.
If someone has a delta printer and pulls the belts off all 3 towers to prevent carriage movements, what happens when you issue a G28
? Does it try to home for ever? Does it eventually give up? Does it crash after a certain distance is moved, possibly due to integer overflow/sign change?
I'd do this, but I am in the office at the moment.
Also, judokan, what happens if you do a G1 X0 Y0 Z110
and then try to home? If you then do a G1 X0 Y0 Z100
and try to home, does it behave the same way?
Does it try to home for ever?
Look at the code. To home it does a movement towards the endstops, 1.5 times the total movement range.
Does it eventually give up?
Look at the code. It simply assumes after this movement that it has reached the endstops.
Does it crash after a certain distance is moved, possibly due to integer overflow/sign change?
No. To overflow you would have to move the axis by several miles.
I did look at the G29 code, but I don't know the firmware well enough to know if there are timeouts that could affect it, nor did I bother looking to see how big the int/floats were for storing this. It was just a suggestion based upon observed issue.
Anyways, when the value overflows, it doesn't appear to crash Marlin, it just aborts the move and prints some interesting Z values to the display...
And for anyone interested, 40km tall delta bots will probably not work. Also, setting the z-home to 1km and executing G28 did not result in a crash, it just kept spinning away trying to get to the sky. So @thinkyhead your statement is validated and the height probably has nothing to do with the issue @judokan9 is having
nor did I bother looking to see how big the int/floats were for storing this.
@zenmetsu I needed to know this recently because I was trying to pack a data structure efficiently. I just ran the code and it reports:
sizeof(char): 1 sizeof(unsigned char): 1 sizeof(int): 2 sizeof(unsigned int): 2 sizeof(long): 4 sizeof(unsigned long int): 4 sizeof(float): 4 sizeof(double): 4 sizeof(void ): 2 sizeof(void ()): 1
Check out the last line. That makes no sense. Unless maybe GCC puts a jump table at the front of the RAM just for this purpose?
Maybe. The inner workings of GCC are black magic to me... i'm more of an ASM guy.
That last one would be "void pointer to function". yes? It's possible that the 1 result is spurious, and in fact the real result is something like an empty return value. When I attempt this on my OSX machine with gcc, the compiler simply replies error: invalid application of 'sizeof' to a function type
. The Arduino compiler should probably choke on this too, but instead it's mapping it to something that has a size.
@judokan9 We've made a lot of changes in the realm of homing and leveling lately, including some possible bug fixes. I suggest testing again with RCBugFix
to see if your issue still exists, or if there are any other oddities that need to be addressed before we put out the next release candidate.
i'm more of an ASM guy
@zenmetsu I used to be an Assembly programmer exclusively and published a couple of games for the Amiga. When RISC processors came along it became nearly impossible to write by hand (and still have a life), so I moved on to C/C++. Of course now with these 8-bit embedded processors making a comeback, I can once again utilize all my old 6502 and 680x0 experience.
If you need it, I've made a helpful script to open the most recent Arduino build as Assembly in a text editor (for OSX, but it's adaptable to other *nixen). I find that reading the Assembler really helps to understand the way the compiler "thinks."
#!/usr/bin/env bash
#
# marlindump
#
# Dump and view Marlin's object output in Assembler
#
OBJDUMP="`which avr-objdump`"
TEMPFIND="/var/folders/*/*/T/*.tmp"
HOME=`echo ~`
DEST="$HOME/Desktop/scratch"
MARNAME=Marlin.ino
ELFNAME=$MARNAME.elf
HEXNAME=$MARNAME.hex
MARLIN_ELF=$(find $TEMPFIND -name $ELFNAME)
MARLIN_HEX=$(find $TEMPFIND -name $HEXNAME)
if [[ -z $MARLIN_ELF ]]; then
echo "`basename $0`: No 'Marlin.ino.elf' found." 1>&2 ; exit 1
fi
SIZE=`stat -f%z "$MARLIN_HEX"`
DATE=`ls -la "$MARLIN_ELF" | awk '{ print $6 " " $7 " " $8 }'`
echo "Dumping build from $DATE ($SIZE)"
mkdir -p "$DEST"
"$OBJDUMP" -S "$MARLIN_ELF" >"$DEST/marlin.a" && subl "$DEST/marlin.a"
If you need it, I've made a helpful script to open the most recent Arduino build as Assembly in a text editor (for OSX, but it's adaptable to other *nixen). I find that reading the Assembler really helps to understand the way the compiler "thinks."
I wish I had this for Windows. I'm re-ordering a lot of floating point calculations to speed things up. But I don't have enough knowledge about how expensive the calls to calc_z0() are. And I need to see how much (if anything) I'm saving by indexing into an array to get the coordinate of a Mesh Index instead of doing a multiply and add.
Maybe I'll see if I can do these commands by hand. What I really would like are the --ii files that mix the comments with the assembly.
Windows, by failing to be a Unix or variant, is a bit of a barrier to deeper collaboration. The *nix shell is such a vital thing. We get all the GNU built-in, and the window system is really just a "thin layer" over all that power.
Hello,
today i installed the new RC7 release and when the Delta homes the first time, it homes normally but it drives additional 100mm down. So far so good... But when i drive 600mm down and try to home again, the Delta stucks at this 100mm point (before it can hit the endstops) and cannot be moved any more (my RUMBA board hangs on). I have to turn on and off again the Board to get it run again. The odd thing is, when i only drive like 400mm downwards the Delta homes normally and everything is ok.
configuration.txt