Klipper3d / klipper

Klipper is a 3d-printer firmware
GNU General Public License v3.0
9.21k stars 5.26k forks source link

SKR Mini E3 V1.2 wrong temperature measurement #2194

Closed Magoo0876 closed 4 years ago

Magoo0876 commented 4 years ago

Hi,

I've played a bit with the new SKR mini E3 V1.2 board and liked it so far, but something was strange. After some time I've measured the head and bed temperatures, and they were too low. My sensors are 104GT-2 on the bed and B3950 on the hotted .

bed set to 60 measured 50 hottend set to 200 measured 160

I've played with the pull-up resistor setting and found that 6k almost compensates the error. Also different sensor type settings did not do the expected.

So I think the internal pulldown resistors of the STM32F102 are activated and this would cause a higher temperature reading for the software. I also tried to activate the pull-up resistors for the thermistor input pin, but this is not a possible setting in the config.

Thank's for your help.

klippy.log

KevinOConnor commented 4 years ago

I tried updating Klipper and then re-flashing the new updated version, which broke Klipper temps.

What's the Klipper version that is working? What is the Klipper version that is not working? If you can produce logs of both then that would help.

Note that a single temperature check doesn't mean much because when a loss of precision occurs the "step temperature" might just randomly match the actual temperature. We're looking for temperature changes to show a smooth transition without steps (or, at least, not such drastic steps). So, a good test would be to try commanding the temperature up 25 degrees or so to see if the transition appears smooth.

If you, or someone else, has time, it would be extremely helpful to run a git bisect session to determine the Klipper revision that results in the regression.

-Kevin

tbe commented 4 years ago

I have done the bisect.

Good news: I found the commit that introduced the change in the temperature readings. It is bd6c25c9f8c28831d146f9077a0a2aa636c6e037

Bad news: The temperatures before this commit are wrong also ( same for the firmware provided by @l-Kage-l . These readings are 16°C for me, at a constant room temperature of 21°C. And Marlin is reading 21°C also.

EDIT: My fault. I was testing with power supply off .. with power supply on and the above commit reverted, everything works fine.

KevinOConnor commented 4 years ago

Alas, looking for correct/incorrect temperatures will not yield anything useful. One needs to look for "temperature steps" during heating/cooling.

If you can confirm that bd6c25c shows "temperature steps" and the previous commit (0b0e5a91) does not, then please attach the two log files from those revisions (that include a session showing a heat and/or cooling cycle).

Thanks, -Kevin

brianrjones69 commented 4 years ago

I found it and it is where tbe said.. The problem is in stm32f1.c in clock_setup() we are losing the prescalers in the rcc->cfgr registers. I am surprised the thing works at all. Here is my mod for the code.

Kevin, If you really need me to do a PR then I will, but its only 2 lines of code.

// Main clock setup called at chip startup
static void
clock_setup(void)
{
    // Configure and enable PLL
    uint32_t cfgr;
    if (!CONFIG_STM32_CLOCK_REF_INTERNAL) {
        // Configure 72Mhz PLL from external crystal (HSE)
        uint32_t div = CONFIG_CLOCK_FREQ / CONFIG_CLOCK_REF_FREQ;
        RCC->CR |= RCC_CR_HSEON;
        cfgr = (1 << RCC_CFGR_PLLSRC_Pos) | ((div - 2) << RCC_CFGR_PLLMULL_Pos);
    } else {
        // Configure 72Mhz PLL from internal 8Mhz oscillator (HSI)
        uint32_t div2 = (CONFIG_CLOCK_FREQ / 8000000) * 2;
        cfgr = ((0 << RCC_CFGR_PLLSRC_Pos)
                | ((div2 - 2) << RCC_CFGR_PLLMULL_Pos));
    }
// ------- here
    cfgr |= (RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV2    
                 | RCC_CFGR_ADCPRE_DIV4);
    RCC->CFGR = cfgr;
// ------- end here

    RCC->CR |= RCC_CR_PLLON;

    // Set flash latency
    FLASH->ACR = (2 << FLASH_ACR_LATENCY_Pos) | FLASH_ACR_PRFTBE;

    // Wait for PLL lock
    while (!(RCC->CR & `RCC_CR_PLLRDY))`
        ;

    // Switch system clock to PLL
    RCC->CFGR = cfgr | RCC_CFGR_SW_PLL;   // --------- here
    while ((RCC->CFGR & RCC_CFGR_SWS_Msk) != RCC_CFGR_SWS_PLL)
        ;
}
KevinOConnor commented 4 years ago

Ouch! That's a really bad error. Thank you for tracking it down. Commit 31b2c3ea.

-Kevin

EDIT: Fix incorrect commit reference.

brianrjones69 commented 4 years ago

You are welcome. Thank you for providing all of us klipper!!

raymondh2 commented 4 years ago

This commit seems to have made the issue a fair amount less noticeable. Still a little bit 'steppy' but nothing like it was before this commit.

markwinger commented 4 years ago

I just included this change in my klipper install. I was seeing the huge stepping issue. The change did seem to improve the stepping issue but the temp of the hotend reads low. Hot end read 17C but room temp was about 25C. Bed temp read 23C, closer but I would expect them to be the same. (Ender 3 with skr mini e3 v1.2). I had and mks gen-l board before and the temps were smooth and at room temp the were the same and withing 1 or 2 degrees C.

mental405 commented 4 years ago

I would think that points to a variance with the thermistor or configuration or it since both the bed and the hotend should be using the same ADC.

tbe commented 4 years ago

I just included this change in my klipper install. I was seeing the huge stepping issue. The change did seem to improve the stepping issue but the temp of the hotend reads low. Hot end read 17C but room temp was about 25C. Bed temp read 23C, closer but I would expect them to be the same. (Ender 3 with skr mini e3 v1.2).

Same here. Interestingly this is only an issue at room temperature. As soon as the bed or hotend are heating up, the temperature is very accurate

markwinger commented 4 years ago

After leaving the system off over night the thermisters both read the same,17C and room temp is about 23C. I think the bed must have actually been warmer. So I read 6 degress low at room temp. My bed typically reads about 5 or 6 degrees low when heated (using a non-contact infrared thermometer). But I cant read the temp of the hotend with that so I don't know if the hot end is low at high temp but I expect the temps are both off by the same amount.

markwinger commented 4 years ago

I just tried printing pet temperature tower. The filament I have says 230-250 so I started at 225 and went to 260. But at 255 I got an error that the ADC was out of range. Is 250 the max temp the ADC is suppose to handle?

jakep82 commented 4 years ago

250 is the default max_temp for most of the provided configs. You can change that in your extruder section.

markwinger commented 4 years ago

Ok, then just slightly misleading error message. I assumed that the "ADC out of range" meant it hit 1023, when it actually meant "Temperature exceeded configured limit" or something to that effect. Don't need it that high any, 245C seems to be best for this filament.

markwinger commented 4 years ago

I've noticed something interesting. With the skr mini E3 installed I see occasional delays in updates in the terminal. Example, when the bltouch is probing, it will probe, move to the next location, the output both results at once. I never saw this before with the previous board (mks gen-l). It looks like output is being buffered some how. If temp updates did this you might see steps in the output. Just an observation, may be expected, I don't know.

vladbabii commented 4 years ago

I think when a long command is running no output from another is done. I don't think I've seen temp info while homing.

markwinger commented 4 years ago

I wast talking about communication delays in general. I see no temp updates during homing, your right. What I meant is that during auto leveling, I have 3x3 with 2 probes each. With the mks gen-l, the terminal showed the x-y-z info immediately after every probe. But with the skr e3 mini sometimes one of these does not occur immediately and is displayed with the next probe giving 2 at once. If temperature updates did this then the times of the updates would off, and even if the temps were exact, the graph may look stepped.

KevinOConnor commented 4 years ago

I'm inclined to close this ticket, because as near as I can tell the root of the issue is a hardware problem.

If anyone would like to troubleshoot further, please attach a Klipper log with a session that performs a heating cycle (go from ~25 degrees to ~60 degrees or vice-versa). Be sure to use the latest version of Klipper. Then also attach a screen-shot of the temperature graph of a similar heating cycle on Marlin on the exact same printer.

-Kevin

markwinger commented 4 years ago

I'm willing to trouble shoot but I have not worked with Marlin at all. Switched directly to Klipper without doing any builds of Marlin. So I could do it if it is useful without the marlin part.

markwinger commented 4 years ago

Question, may/may not be related to temperature graphs. I mentioned delays previously. I notice that the g-code viewer no longer syncs to the current layer. I printed the flow rate calibration cube and the viewer was 2 layers behind. Nothing displayed until the first 2 layers finished, and the print finished while the viewer was 2 layers before completion. This was not the behavior a while ago. When I installed the skr mini, I moved to the latest octoprint, and klipper (Klipper was 6 months old or so at the time). It appears I have some buffering that was not present before. I see strange issues sometimes that appear related. If I get a error, I will see the same error (Must home axis first) in popups and in the terminals, hundreds of times before it settles down so I can correct the problem. Is this buffering in klipper or octoprint? Could this explain odd, stepped graphs?

tbe commented 4 years ago

If anyone would like to troubleshoot further, please attach a Klipper log with a session that performs a heating cycle (go from ~25 degrees to ~60 degrees or vice-versa). Be sure to use the latest version of Klipper. Then also attach a screen-shot of the temperature graph of a similar heating cycle on Marlin on the exact same printer.

I'll do so later

KevinOConnor commented 4 years ago

@markwinger - it does not sound related. If you think there is something wrong with the software, you should open a new github issue and be sure to attach the Klipper log file to that issue as described at: https://www.klipper3d.org/Contact.html

-Kevin

Magoo0876 commented 4 years ago

Hi, I can tell that the same cpu on an other board like Fysetc Cheetah 1.1b works perfekt without any steps in the temperature curve and perfect regulation. So I'm quite sure the board is not very well designed like Kevin supposed. If you look on the power supply circuit the skr has a coil connected to the output of the 3,3V regulator which acts as low pass. Sounds good if you think of noise but in case of regulation it is a bad idea, because you loose the direct feedback of the 3,3 V for the cpu and sensors. So maybe a bridge over the L3 is a good idea. This coil exists also on the SKR 1.3 and was removed on the 1.4. The SKR Pro is also without. See the difference at the schematics:

Feysetc Cheetha 1.1b

Bildschirmfoto 2020-03-22 um 20 06 46

SKR mini E3 1.2

Bildschirmfoto 2020-03-22 um 20 08 40

I would test it but I do nat have a lot time and the actual connected cheetah board does not senseless home on y.

hth Magoo

markwinger commented 4 years ago

At first it thought I you were right. But looking at the specs for the MD2029-300y on Digikey that is actually a ferrite bead. It is only 30 ohms at 100MHZ. I think that would be to small to cause issues unless it creates oscillation on the power supply. Yes, it looks like hardware problem, but it also does not appear like noise. If it were noise I would expect to see random peaks and valleys in the graph. Instead we see periods where the temp increases very little if at all, then quick rises.

It behaves as if the ad converter resolution is a 1 bit for a couple of degrees in the low range. (that would be a hardware problem too)

You save the Fysetc Cheetah 1.1b works perfectly with the same cpu. Is the source available for the Fysetc Cheetah 1.1b to see if there is some difference in the code for setting up the ad converter?

markwinger commented 4 years ago

Kevin-I'm not saying there is a software issue, I don't know. Looking at the schematics, I don't see a reason for this kind of behavior, but there could be subtle errors in the design just like problems with code can be very subtle. I was just pointing out the delays thing because the graph is temp vs time and if the time like has delays it could explain the stepped graph even the temperatures read are exact.

Right now it seems your comment earlier about r23, r24 not belonging in the thermister circuits is a good point, but it does not seem like it should make that big of a difference. If I get some time I will try bypassing them if I can and see that changes anything.

tbe commented 4 years ago

are we still talking about the obvious wrong temperature readings? Because if Marlin read the idle temperatures correctly at room temperature, kipper is off by 4° I would assume that is a software issue.

Or are we talking about steps in de measurements?

I'm confused, because this issue is full of off topic and a mix of several issues. Maybe it would be better to close it and track the things in different issues -- Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

markwinger commented 4 years ago

I apologize if I changed the subject. The stepped temperature graph that shows up on my system is clearly not the correct temperature so I was thinking it is the same root cause.

KevinOConnor commented 4 years ago

I'm confused, because this issue is full of off topic and a mix of several issues.

The main issue, as I understand it, is that the SKR mini board (in various incarnations) have low precision when measuring voltages above ~3V. This results in "step transitions" at low temperatures because of this loss of precision in the analog to digital conversion.

Just looking at the current temperature does not always show the issue, because the "step position" may just happen to be at an accurate temperature.

If anyone would like to troubleshoot further, please attach a Klipper log with a session that performs a heating cycle (go from ~25 degrees to ~60 degrees or vice-versa). Be sure to use the latest version of Klipper. Then also attach a screen-shot of the temperature graph of a similar heating cycle on Marlin on the exact same printer.

-Kevin

vladbabii commented 4 years ago

The main issue, as I understand it, is that the SKR mini board (in various incarnations) have low precision when measuring voltages above ~3V. This results in "step transitions" at low temperatures because of this loss of precision in the analog to digital conversion. -Kevin

Using a logic level converter to keep voltage always under 3v would fix it, then ?

vladbabii commented 4 years ago

Something like this with the low voltage references set to 3v, the L1 pin tied to analog input, the HV pin tied to 5v and H1 to thermistor ?

markwinger commented 4 years ago

The ST32F103RCT6 chip is a 64 pin. The docs indicate the Vref is not an external pin on the 64 pin packages and is internal only. Can Vref be changed internally? (have not studied the docs on this cpu yet.

vladbabii commented 4 years ago

Forgot to post link to logic level converter - https://www.pololu.com/product/2595

tbe commented 4 years ago

The main issue, as I understand it, is that the SKR mini board (in various incarnations) have low precision when measuring voltages above ~3V. This results in "step transitions" at low temperatures because of this loss of precision in the analog to digital conversion.

I have compared the heatup curve for marlin and klipoer with an up2date build.

Klipper: heatup_klipper

Marlin: heatup_marlin

There are steps in both graphs, and i don't see any major issue with that. The only thing that is still off, is the idle temperature. It reads 17°C on Klipper, while Marlin reads 21°C. And the bed has 21°C, verified via infrared probe.

Magoo0876 commented 4 years ago

I would like to show you the curve from the Cheetah board 1.1b with 2209 drivers same cpu as SKR mini E3 and very similar schematics. The board is flashed with the latest software and firmware for the pi. As you see no steps at all. That is the reason why I don't think that it is a software issue.

Bildschirmfoto 2020-03-28 um 22 59 17

tbe commented 4 years ago

As you see no steps at all. That is the reason why I don't think that it is a software issue.

Fine, no steps. But still, Marlin gets the Temperature right (21°C) and Klipper is of (17°C).

KevinOConnor commented 4 years ago

Fine, no steps. But still, Marlin gets the Temperature right (21°C) and Klipper is of (17°C).

My understanding is that the temperature difference is a symptom of the "steps". That is, if the board happens to "step" from 10.3 degrees to 17.2 degrees to 22.4 degrees then an actual static temperature of 21 degrees can result in a static report of 17.2 degrees. If the board steps between 13.1 degrees, 20.9 degrees, 25.2 degrees, then a static temperature of 21 degrees can show up as 20.9 degrees. Completely power off the board (fully remove it from USB and mains power for 30 seconds) and I suspect you'll get different "step" positions on each restart. So, my understanding is that it is really just random whether or not the "correct" temperature is reported.

That said, I don't doubt Marlin is somehow obtaining smaller "step" sizes. I don't know why - I expect it is some obscure timing difference during calibration or measurement. At least for me, it's not a priority to spend a ton of development effort to make a broken board "slightly less broken".

-Kevin

brianrjones69 commented 4 years ago

I also believe it is a hardware issue, but it would not be the first time that software solved a hardware issue. The previous commit for the clocks fixed the ADC 'steps'. I suspect the ADC was doing partial conversions as the least significant bits were always zero. The issue now is that the ADC is not very accurate and i doesnt take too many counts to change the temperature.
A document from st https://www.st.com/resource/en/application_note/cd00211314-how-to-get-the-best-adc-accuracy-in-stm32-microcontrollers-stmicroelectronics.pdf describes how the adc is sensitive to input capacitance and resistance. I suspect that the 100 resistor is slowing the adc sampling capacitors charge/discharge time. I decided to try their suggestion and increase the sampling time by changing the clock divider from 4 to 6. That worked. I then put the divider back to 4 and tried to change aticks value to something higher, both resulted in a temp of 20 degrees rather than 17.
Maybe someone smarter than me could do the calculations to determine what the actual sampling time should be.

piet206 commented 4 years ago

Seem to be having the same issue. Room temp is 20C but thermocouple reading is 16C. Also double checked with a heat gun. Didn't have this issue with Marlin 2.0.3.

markwinger commented 4 years ago

I suspect Brian is on to something. I dont think the 100 ohm resistor has anything to do with it. 100 ohm is very small compared to the input impedance of the adc. I see no need for the 100 ohm resister but it should not cause any problems either. But I do think there timing issue with the internal charge caps. Can you show me to the changes you tried for atick and clock divider?

brianrjones69 commented 4 years ago

Sure!
In klipper/src/stm32/adc.c

change line 86 to: uint32_t aticks = 7;

optionally change line 122 to: return timer_from_us(50);

In Klipper/src/stm32/stm32f1.c line 145 to:

RCC->CFGR |= (cfgr | RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV2 | RCC_CFGR_ADCPRE_DIV6);

I have to agree with you about the resistor, but here we are. Please let us know what you find.

markwinger commented 4 years ago

I'm confused a little. You said: "optionally change line 122 to: return timer_from_us(50);" if change this works, then this code: if (!(sr & ADC_SR_EOC) || adc->SQR3 != g.chan) // Conversion still in progress or busy on another channel goto need_delay; // Conversion ready return 0; must be returning 0, when the conversion is not really complete. The docs say the EOC is set 14 cycles after start conversion. That would mean the conversion takes longer than 14 cycles, and eoc gets set before completion. Does this sound right or am I missing something?

brianrjones69 commented 4 years ago

EOC is not set until End Of Conversion. There are two steps to complete an ADC. Sampling and then conversion. Sample time is based upon the ticks and ADC clock. Conversion time is (going from memory) 12.5 clocks. I say optional because, the system will look to see if the sample is complete every 10 us, without the change. Since we know that its not going to be ready for 40us, why ask before that. It wont hurt anything though if we do.

EDIT: my term conversion above is probably not correct. It really does the conversion during sampling, bit by bit. I have no idea why there is an additional 12 clock delay. EOC is reset by: gpio_adc_read --> adc->SR = ~ADC_SR_STRT;

markwinger commented 4 years ago

The doc said 14 clocks. Since it is successive approximation, so 1 clock per bit (binary search), probably 1 clock to start, 1 to clock the result, total 14 clocks ticks. I'll need to study the docs to understand all the different clocking involved. A complete block diagram of the adc would really help, they always simplify the describe how it works. More difficult.

brianrjones69 commented 4 years ago

Mark - I had to look at the reference manual again. I think I am correct in saying that the ticks programs the sampling time. This is the time that the input is connected to the sampling capacitors. After the sampling time the input is disconnected and the remaining 12.5 clocks are the conversion.

See section 2 of https://www.st.com/resource/en/application_note/cd00211314-how-to-get-the-best-adc-accuracy-in-stm32-microcontrollers-stmicroelectronics.pdf

The total time to convert = sampling time (ticks) + 12.5 clocks (section 11.6 of reference manual) sampling time is configured by the ticks. (Section 11.12.5 of reference manual) If the ticks is set at 0 - then yes sampling time is 1.5 clocks and the entire conversion would be 14 clocks. But with ticks = 7, it take 239.5 clocks for sampling the input + 12.5 for conversion.

STM32F103XX Reference Doc https://www.st.com/resource/en/reference_manual/cd00171190-stm32f101xx-stm32f102xx-stm32f103xx-stm32f105xx-and-stm32f107xx-advanced-arm-based-32-bit-mcus-stmicroelectronics.pdf

So with increasing the sample time, we have the input connected to the sampling caps longer and it gives it time to settle out and gives a more accurate value.

What I dont understand is why Marlin works. The only difference is that they use continuous sampling with DMA transferring data to memory. I didnt see anything in the code that averages or anything. It looks like they grab the last converted value.

-Brian

markwinger commented 4 years ago

Section 11.3.6-After the start of ADC conversion and after 14 clock cycles, the EOC flag is set and the 16-bit ADC Data register contains the result of the conversion.

That is why I said 14.

markwinger commented 4 years ago

I'm almost finished with a long print. When done I will try changing ticks to 7 and see if I get the same result as you.

markwinger commented 4 years ago

Sorry, was gone a couple days. At room temp my bed is 23C with ir scanner, but the temps reported show 16.6 in the terminal on both the nozzle and bed. (ticks at 7) I will change back and check again.

brianrjones69 commented 4 years ago

Did you also change:

In Klipper/src/stm32/stm32f1.c line 145 to:

RCC->CFGR |= (cfgr | RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV2 | RCC_CFGR_ADCPRE_DIV6);

markwinger commented 4 years ago

I must be on the wrong place. I cloned the source on Thursday and stm32f1.c looks like this:

cfgr |= RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV2 | RCC_CFGR_ADCPRE_DIV4;
RCC->CFGR = cfgr;

line 145---> RCC->CR |= RCC_CR_PLLON;

The change is div4 to div6 right or is there more?

brianrjones69 commented 4 years ago

Just the change from DIV4 to DIV6.