MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.36k stars 19.26k forks source link

M500 doesn't always save properly when done midprint #416

Closed buzztiaan closed 9 years ago

buzztiaan commented 11 years ago

M92 E700 during print. M500 during print.

M503 shows Esteps now being 851 (wtf)

Doesn't always happen, but i now need to doublecheck if the correct settings are saved, and resave them (often)

ErikZalm commented 11 years ago

Access to eeprom is critical when there are interrupts running. We should block M500 if the printer is moving.

buzztiaan commented 11 years ago

This also occurs when the printer is not moving.

My arduino is a 2560 r3.

ErikZalm commented 11 years ago

The heater interrupts should also be disabled. But we can disable them for a short time without problems. We can not disable the stepper interrupt during movement.

boelle commented 9 years ago

This one is created about a year ago and there have been a lot of changes, please download the latest copy of marlin and see if the problem is still there. Also you the latest arduino IDE to flash the marlin firmware. If you board files etc only work under old ide upgrade those first so they work under latest IDE.

If you create board files for hardware that are not in the https://github.com/ErikZalm/Marlin/tree/Marlin_v1/ArduinoAddons then please fork marlin and add the files and then create a pull request so that we can get the hardware supported. This will also give an idea what hardware people are using.

buzztiaan commented 9 years ago

not sure what you are doing boelle ...

boelle commented 9 years ago

right now having xmas dinner 😃

Den onsdag den 24. december 2014 skrev buzztiaan notifications@github.com:

not sure what you are doing boelle ...

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68065935.

buzztiaan commented 9 years ago

i ment about closing bugs that are not flagged fixed , nor lack any reproduction method

galexander1 commented 9 years ago

boelle was just doing much-needed house cleaning. since you say it exists, i'll just re-open it.

regarding the fix, if the problem is interrupts coming during eeprom writes, it really needs to pause printing while it writes. probably not hard to do, but i have to wonder, why do you want to change the eeprom values when printing?

buzztiaan commented 9 years ago

tuning esteps during print helps visually inclined tuners ;)

saving during print makes sure the paper you wrote the new esteps on is not the only place this gets stored

boelle commented 9 years ago

oh yes, i closed all those more than year old, then all those with very little action.

they can be opened again if people respond 😜

Den onsdag den 24. december 2014 skrev buzztiaan notifications@github.com:

i ment about closing bugs that are not flagged fixed , nor lack any reproduction method

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68068265.

Lukelectro commented 9 years ago

I think the folowing way would work to save eeprom settings during print:

Make sure heaters are not currently heating (to prevent overheating) and nothing is moving disable interupts save settings to eeprom re-enable interupts carry on with what you where doing.

Maybe simply blocking M500 during print (make it return a message like "can't save to eeprom during print") is a better solution, because turning off the heaters and stopping movement during a print, even for a short while, will affect the print. (And keeping the heaters on while disabling temperature control seems a even worse idea)

boelle commented 9 years ago

are you up to doing the fix to get it to work?

alexborro commented 9 years ago

How long does it takes to save the eeprom?? I was thinking about some miliseconds, nothing more.. I don't think miliseconds will mess heaters and/or movement...

By the way, I have already saved the eeprom during printing and no problem at all.

Cheers.

Alex.

2015-01-02 18:50 GMT-02:00 Bo Herrmannsen notifications@github.com:

are you up to doing the fix to get it to work?

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68561025.

"Não é o mais forte da espécie que sobrevive, nem o mais inteligente. É aquele que se adapta melhor as mudanças" ( Charles Darwin )

Alex Borro

boelle commented 9 years ago

so for a dummie this means that means we can save while printing without any issue? given it does not take more than a few milis to save

Lukelectro commented 9 years ago

3.4ms per byte, as per page 22 of the atmega 328 datasheet. That's probably a typical specification, I don't know what's worst case.

quotes from datasheet:

" Table 5-1. 3.4 ms Erase and Write in one operation (Atomic Operation) 1.8 ms Erase Only 1.8 ms Write Only "

And here is the reason for these problems:

" The EEPROM Write Enable Signal EEPE is the write strobe to the EEPROM. When address and data are correctly set up, the EEPE bit must be written to one to write the value into the EEPROM. The EEMPE bit must be written to one before a logical one is written to EEPE, other- wise no EEPROM write takes place. The following procedure should be followed when writing the EEPROM (the order of steps 3 and 4 is not essential):

  1. Wait until EEPE becomes zero.
  2. Wait until SELFPRGEN in SPMCSR becomes zero.
  3. Write new EEPROM address to EEAR (optional).
  4. Write new EEPROM data to EEDR (optional).
  5. Write a logical one to the EEMPE bit while writing a zero to EEPE in EECR.
  6. Within four clock cycles after setting EEMPE, write a logical one to EEPE. The EEPROM can not be programmed during a CPU write to the Flash memory. The software must check that the Flash programming is completed before initiating a new EEPROM write. Step 2 is only relevant if the software contains a Boot Loader allowing the CPU to program the Flash. If the Flash is never being updated by the CPU, step 2 can be omitted. See “Boot Loader Support – Read-While-Write Self-Programming, ATmega88 and ATmega168” on page 268 for details about Boot programming.

Caution: An interrupt between step 5 and step 6 will make the write cycle fail, since the EEPROM Master Write Enable will time-out. If an interrupt routine accessing the EEPROM is interrupting another EEPROM access, the EEAR or EEDR Register will be modified, causing the interrupted EEPROM access to fail. It is recommended to have the Global Interrupt Flag cleared during all the steps to avoid these problems. "

alexborro commented 9 years ago

I have just counted how many write command we have: about 35 commands.. If the most variable are bytes, so it will take less than 150ms.. lets say 200ms at maximum.

2015-01-02 19:16 GMT-02:00 Lukelectro notifications@github.com:

3.4ms per byte, as per page 22 of the atmega 328 datasheet. That's probably a typical specification, I don't know what's worst case.

quotes from datasheet:

" Table 5-1. 3.4 ms Erase and Write in one operation (Atomic Operation) 1.8 ms Erase Only 1.8 ms Write Only "

" The EEPROM Write Enable Signal EEPE is the write strobe to the EEPROM. When address and data are correctly set up, the EEPE bit must be written to one to write the value into the EEPROM. The EEMPE bit must be written to one before a logical one is written to EEPE, other- wise no EEPROM write takes place. The following procedure should be followed when writing the EEPROM (the order of steps 3 and 4 is not essential):

  1. Wait until EEPE becomes zero.
  2. Wait until SELFPRGEN in SPMCSR becomes zero.
  3. Write new EEPROM address to EEAR (optional).
  4. Write new EEPROM data to EEDR (optional).
  5. Write a logical one to the EEMPE bit while writing a zero to EEPE in EECR.
  6. Within four clock cycles after setting EEMPE, write a logical one to EEPE. The EEPROM can not be programmed during a CPU write to the Flash memory. The software must check that the Flash programming is completed before initiating a new EEPROM write. Step 2 is only relevant if the software contains a Boot Loader allowing the CPU to program the Flash. If the Flash is never being updated by the CPU, step 2 can be omitted. See “Boot Loader Support – Read-While-Write Self-Programming, ATmega88 and ATmega168” on page 268 for details about Boot programming.

Caution: An interrupt between step 5 and step 6 will make the write cycle fail, since the EEPROM Master Write Enable will time-out. If an interrupt routine accessing the EEPROM is interrupting another EEPROM access, the EEAR or EEDR Register will be modified, causing the interrupted EEPROM access to fail. It is recommended to have the Global Interrupt Flag cleared during all the steps to avoid these problems. "

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68563070.

"Não é o mais forte da espécie que sobrevive, nem o mais inteligente. É aquele que se adapta melhor as mudanças" ( Charles Darwin )

Alex Borro

gregrebholz commented 9 years ago

I would guess most values are int (2 bytes) or floats (4 bytes). The EEPROM version identifier is a 4 byte character string (3 characters and null). Other arduino sizes are long (4 bytes), double (4 bytes; 8 bytes on the Due), or boolean (1 byte).

Anyway, probably closer to 1/2 second, but the stepper motor code is entirely dependent on receiving regular interrupts trigged by the Arduino's internal timer (handled by ISR(TIMER1_COMPA_vect)) and it will not be able to generate the stepper pulses during this time with interrupts disabled. Presumably the blocks in the move planner just don't get popped, so movement can resume when interrupts are enabled again, but you should at least still see a noticeable halt in movement and it may throw off position... I have no idea how a stepper motor would behave if the pulse train to the coils froze for a while. @ErikZalm did say "We can not disable the stepper interrupt during movement" in the comments above, so I think allowing M500 during movement should get a lot of rigorous testing before any implementation. In the meantime better to block M500 rather than write corrupted data to the EEPROM, which is what is happening today.

thawkins commented 9 years ago

Can you not cheat and insert an action into the planner queue to write the eeprom contents, that way it should be action-ed between atomic movements. the action can just suspend the heater ISR, if and turn it back on when finished.

alhirzel commented 9 years ago

I am not 100% familiar with the code here but it is very important to know all sources of interrupts if we're going to do this... I've listed what I know about, any others maybe buried in libraries?

@thawkins - if I understand, you're proposing that we forecast times when the stepper and extruder drivers will be in motion for 500ms so we have a safe EEPROM window in which we can disable interrupts, kick the EEPROM then sei() and be on our way. We would leave the watchdog running and disable the temp control for this time. Sound about right?

alhirzel commented 9 years ago

If we decide to do that, we have no way of guaranteeing the storage will occur (imagine a particularly crafty set of gcode that never gives us the desired gap). How do we deal with that possibility, slow the print or error to the user?

thawkins commented 9 years ago

I was suggesting that the eprom write code is run off a handler that is driven of an action inserted into the motion queue, in that way the write can only occure between movements and not during a movement, as it will be syncronised with the stream of movement actions. On Jan 4, 2015 1:25 AM, "Alex Hirzel" notifications@github.com wrote:

I am not 100% familiar with the code here but it is very important to know all sources of interrupts if we're going to do this... I've listed what I know about, any others maybe buried in libraries?

  • extruder driving in stepper.cpp:TIMER0_COMPA_vect
  • stepper driving in stepper.cpp:TIMER1_COMPA_vect
  • hotend/hotbed control, reading filament width in temperature.cpp:TIMER0_COMPB_vect
  • 4000ms watchdog timer in watchdog.cpp:WDT_vect

@thawkins https://github.com/thawkins - if I understand, you're proposing that we forecast times when the stepper and extruder drivers will be in motion for 500ms so we have a safe EEPROM window in which we can disable interrupts, kick the EEPROM then sei() and be on our way. We would leave the watchdog running and disable the temp control for this time. Sound about right?

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68602267.

thawkins commented 9 years ago

i was not suggesting to insert it into a suitable length move, I was suggesting that it is inserted between moves, which should be atomic.

On Sun, Jan 4, 2015 at 1:27 AM, Alex Hirzel notifications@github.com wrote:

If we decide to do that, we have no way of guaranteeing the storage will occur (imagine a particularly crafty set of gcode that never gives us the desired gap). How do we deal with that possibility, slow the print or error to the user?

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68602318.

alhirzel commented 9 years ago

But if the print pauses for 200ms (edit: not 500ms) mid-print that may cause undesirable results e.g. with a hot hotend resting in one place, it could disturb surface finish on the piece.

alhirzel commented 9 years ago

Yep I think I understand now =] Just trying to think of the consequences

thawkins commented 9 years ago

I agree that there is potential for a pause on the print and the introduction of a "hicky", but probably no worse than USB buffering pauses introduce,.

And if it impacts the print then people should not do it... Their choice.

On Sun, Jan 4, 2015 at 2:30 AM, Alex Hirzel notifications@github.com wrote:

Yep I think I understand now =] Just trying to think of the consequences

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68604416.

alhirzel commented 9 years ago

We could split the EEPROM write up into multiple operations, e.g. write each multibyte value separately but atomically. This would decrease the delay to <<200ms and reduce the effect I'm worried about. Can also ship it with a disclaimer or leave it disabled by default or something like that like you've said

Last worry is what if EEPROM write fails, any avenue of that compromising the print?

gregrebholz commented 9 years ago

It's an interesting idea. We could add an attribute to the typedef block_t in planner.h to label the block as an eeprom-write instead of the assumed behavior as a motion block. M500 would insert the block into the motion planner, and the interrrupt handler would have to check that attribute around stepper.cpp line 321... completely skipping the pulse generation code and instead disabling all interrupts, writing EEPROM, enabling interrupts, and clearing the block out of the motion queue.

You're still messing around in an area of Marlin where millisecond timing is critical, and lingering in a block of code for >500ms that normally completes in only a few... but it might work.

alhirzel commented 9 years ago

It's like we're writing our own RTOS. ;P

gregrebholz commented 9 years ago

Probably not the best place to ask, but does anyone know why Marlin doesn't have a "settings" typedef so the eeprom read/write can be done algorithmically instead of one named variable at a time? It would make it much more difficult to screw up the read/write process when adding new variables to the settings, and the current approach with ifdef all over the place makes it very difficult to understand what exactly exists in any given user's eeprom. Anyway, if it was in a typedef we could much more easily break down the M500. Every time the stepper ISR is called and finds it's working on the M500 motion block it could write a fixed number of bytes of the settings typedef, and if successful advance a "bytes written" attribute in the motion block. When complete it would clear the block (writing the eeprom version header last to ensure its only valid after the whole thing has been written). As it currently stands the "write eeprom" is like "write variable x at offset A", "write variable y at offset B", etc. and breaking all that up into a series of smaller atomic writes would be a complete mess I think.

alhirzel commented 9 years ago

Related to the above comment, if we are going to do sub-value writes (e.g. byte-at-a-time rather than value-at-a-time) we can totally abandon atomicity if we have a redundant set of EEPROM data and a switch to go between them.

gregrebholz commented 9 years ago

Each write operation will still need to be "atomic" in the sense that an interrupt during the write sequence will cause corrupted values to be written... so the process has to be "disable interrupts, write, enable interrupts" regardless of how much is being written.

There's still a benefit to redundant sets in the event that the system halts before a complete set is written. Still though, this is a pretty major change and we're supposed to be focusing on stability and bug fixing. "M500 writes corrupted eeprom values during movement/heating" is a bug... if I have time this weekend I'll put together a pull request to disable it until a working implementation of this can be proposed.

Lukelectro commented 9 years ago

I know I'm new here, but I would agree with disabling M500 during movement/print. Very much better then writing corrupt values.

(It would be nice though if you could pause a print, edit values, resume print, and if found good, pause, save and resume.)

boelle commented 9 years ago

an idea.... do save at next layer change? but writes on hold until the next layer change comes up

or would that not work ?

boelle commented 9 years ago

if not i will vote for the simple block as @ErikZalm said about 2-3 years ago

"We should block M500 if the printer is moving."

nophead commented 9 years ago

Reading this comment: https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68563070

It looks like you only need to disable interrupts between step 5 and step 6, which is only a couple of instructions. That shouldn't affect anything. This is assuming the interrupts don't use the EEPROM.

If that is all that is needed the change is trivial.

boelle commented 9 years ago

if you know where to change and in general know what you are doing then yes... but me a total noob i dont want to risk screwing up

nophead commented 9 years ago

I wasn't suggesting you personally do it but this discussion has assumed it can't be done without affecting the print but if the interrupts don't read from EEPROM it looks like it can easily be done. Does anybody know otherwise?

boelle commented 9 years ago

and this is the last one before we can start testing the heck out of it in the dev branch.. oh well i guess i'm just impatient :-/

nophead commented 9 years ago

Looks like the code that does the writes is in the arduino library and the section marked as critical here: https://github.com/arduino/Arduino/blob/3a8ad75bcef5932cfc81c4746a87ddbdbd7e6402/build/macosx/dist/eeprom.h#L325 is probably where interrupts need to be disabled.

alexborro commented 9 years ago

I agree with Chris, I think we can first try the easy path. Otherwise I think queueing M500 among movement command is the better option.

Cheers.

Alex. Em 04/01/2015 18:08, "Chris" notifications@github.com escreveu:

Looks like the code that does the writes is in the arduino library and the section marked as critical here: https://github.com/arduino/Arduino/blob/3a8ad75bcef5932cfc81c4746a87ddbdbd7e6402/build/macosx/dist/eeprom.h#L325 is probably where interrupts need to be disabled.

— Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68646814.

gregrebholz commented 9 years ago

Just came across this and wanted to put it here for posterity: http://www.nongnu.org/avr-libc/user-manual/optimization.html Apparently in avr-gcc the sei() and cli() functions for disabling and enabling interrupts are macros to assembly statements, and compiler optimizations are free to reorder them with the surrounding code. This means the exact statement order in the C code is not preserved in the conversion to assembly/machine code, unless compiler optimizations are completely disabled (which would be bad in our case). Anyone know a workaround offhand?

gregrebholz commented 9 years ago

Also @nophead, the recommendation at the bottom of that comment is "It is recommended to have the Global Interrupt Flag cleared during all the steps to avoid these problems." So that would be disable before step 1 and enable after step 6, which I believe is common practice when writing to eeprom. The only example they cite is an interrupt handler that accesses EEPROM, but there may be other ways to break an eeprom write. In general the eeprom write operation is very timing critical, and needs to run uninterrupted.

The code from the arduino library is particularly interesting however, as it already includes the "cli" that results in global interrupts being disabled... and I don't see the sei that would reenable them... not sure what's going on there... but we may also not have control over when interrupts are functioning and not. If we write to eeprom using the arduino libraries they may mess with our interrupts on their own terms, affecting stepper/heater control without our direction. This doesn't explain how corrupted values are being written when M500 is run during operation.

nophead commented 9 years ago

The way to avoid the compiler reordering it is to use assembler, which is presumable why it is already written in assembler.

The only time critical section is 5 and 6. The only reason for disabling interrupts for the rest of it is if the interrupts mess with the EEPROM registers. Do any of the interrupts read from EEPROM?

Since it already disables interrupts there are two mysteries: why it goes wrong in the first place and how interrupts get enabled again.

The status register is listed in the clobbers, so perhaps the compiler restores it. I am not familiar with AVR assembler so I will have to read up on it.

gregrebholz commented 9 years ago

My comment about the cli() and sei() macros was referring to the proposal that we disable interrupts while processing the "faux motion block" in the planner. We wouldn't want to write that whole thing in assembly. But I've changed my opinion on the importance of that level of timing after some more reading --

I think now that there are three sources for eeprom-vs-interrupt problems. The first is as you mentioned, other code (executed by an interrupt handler) trying to use eeprom at the same time as your eeprom write. I think we're ok there as none of the interrupt handlers I've seen in Marlin access eeprom for reading or writing. A second source is that the AVR has a single charge pump for both eeprom and flash memory, so the datasheet mentions having to ensure your code isn't writing to flash when you ask it to write to eeprom. I don't think Marlin flashes itself, but its worth a little searching. And the third source is the "timing critical" portion that eeprom_write_byte is handling in your referenced code. The datasheet says that EEWE must be written within 4 cycles after EEMWE is written, and that is only half a microsecond on an 8Mhz clock. The cli/sbi/sbi assembly will accomplish that.

So, I don't have an answer to either of your two mysteries. They are mysterious. :) Marlin only calls sei() to enable interrupts during initial init, during a kill() to allow LCD updates to occur, and in the spiSend/spiRec routines of Sd2Card.cpp for printing from SD cards. It seems other developers call sei() explicitly -- https://github.com/pololu/libpololu-avr/blob/master/test/3pi-test/pushbuttons.c but it really seems like that should be happening in the Arduino library... though I suppose it would have to first test if interrupts were enabled, to know if they should be enabled by the end of the call.

nophead commented 9 years ago

OK one mystery solved: the first line of the asm section saves the status register in r0 before the cli.

in r0, %[__sreg]

The last line restores it.

out %[__sreg], r0

The interrupt bit is part of the status register so it gets saved and restored.

nophead commented 9 years ago

When I disassemble ConfigurationStore.o compiled for AtMega1284P I find it actually calls eewr_byte_m1284p, not eeprom_write_byte which is written in the source code. I can't find source for eewr_byte_m1284p and I can't work out how it gets called instead of eeprom_write_byte.

galexander1 commented 9 years ago

This is from /usr/lib/avr/include/avr/eeprom.h on my PC:

define eeprom_write_byte _EEPROM_CONCAT2 (__eewr_byte, _EEPROM_SUFFIX)

Hope this helps.

nophead commented 9 years ago

Hmm, yes it seems to come from there under linux and not from the arduino distribution. Not sure where the source code is.

So it seems the code varies according to the device and the OS it was compiled on as I am sure on Windows it uses the arduino version. So maybe it works for some people and not others?

nophead commented 9 years ago

Yes that is what is on my RPI, which is the version I dissembled but on Windows there is no /usr so it must come from arduino-1.0.5\hardware\tools\avr\avr\include\avr\eeprom.h which has the inline assembler version we have been discussing. That one disables interrupts, so it is a mystery why it doesn't work.

Looks like the avr library version is different. Some source code here: https://cells-source.cs.columbia.edu/plugins/gitiles/toolchain/avr-libc/+/master/avr-libc-1.7.1/libc/misc/eewr_byte.S, not necessarily the same version but it doesn't mask interrupts if AVR_XMEGA is set. However that seems unlikely on a Reprap board.

On 5 January 2015 at 21:48, galexander1 notifications@github.com wrote:

This is from /usr/lib/avr/include/avr/eeprom.h on my PC:

define eeprom_write_byte _EEPROM_CONCAT2 (__eewr_byte, _EEPROM_SUFFIX)

Hope this helps.

Reply to this email directly or view it on GitHub https://github.com/ErikZalm/Marlin/issues/416#issuecomment-68787863.

nophead commented 9 years ago

@buzztiaan, Which version of Arduino are you using and which OS?