LinuxCNC / linuxcnc

LinuxCNC controls CNC machines. It can drive milling machines, lathes, 3d printers, laser cutters, plasma cutters, robot arms, hexapods, and more.
http://linuxcnc.org/
GNU General Public License v2.0
1.76k stars 1.14k forks source link

missing tooth encoder issue (slow speed not detecting gap correctly?) #2635

Closed samcoinc closed 10 months ago

samcoinc commented 11 months ago

Here are the steps I follow to reproduce the issue:

100 line single channel encoder with 1 missing tooth.

  1. I don't know how to create a simple config to show this
  2. I have a 100 line encoder with 1 missing tooth. 3.The spindle is running at about 325 rpm
  3. Running a spindle synced motion command (in this instance G33)
  4. Motion exerts index enable true
  5. index enable is set to false instantly not waiting for the gap
  6. this happens at every pass. 8.If I increase the rpm to 450 rpm the index enable goes false as expected after the missing tooth gap 9.Reducing the rpm back down to 320 or even lower - the index enable works as expected finding the missing toolthn

The first screenshot shows the index enable not going false in the correct location.

the rest of the screen shots show the index enable going false correctly after increasing the rpm to 320 - then lowering it - show it still works correctly - even at 120 rpm (the scope is in rps)

It worked properly before this:

I have tried master and 2.9 - both exhibit this issue..

Screenshot from 2023-09-09 19-13-37 Screenshot from 2023-09-09 19-16-47 Screenshot from 2023-09-09 20-02-34 Screenshot from 2023-09-09 20-04-21

samcoinc commented 11 months ago

ok - if you look at the 1st and 3rd images - the actual spindle rpm is the same. But the encoder is reporting it incorrectly..

5.2rps (ish) vs 3rps(ish)

something is odd..

samcoinc commented 11 months ago

Ok - yes - you can see the rps is different when it isn't seeing the index correctly (like it isn't taking into account the missing tooth..) The encoder velocity is much higher. (the pwm to the spindle drive is the same - after increasing the spindle rps to make the encoder to see the index correctly)

here are the encoder settings 37 bit I/O TRUE encoder.2.counter-mode 37 s32 OUT 1840 encoder.2.counts 37 s32 OUT -95731 encoder.2.counts-latched 37 bit I/O FALSE encoder.2.index-enable <=> indexenable 37 bit IN TRUE encoder.2.latch-falling 37 bit IN FALSE encoder.2.latch-input 37 bit IN TRUE encoder.2.latch-rising 37 float IN 1 encoder.2.min-speed-estimate 37 s32 IN 1 encoder.2.missing-teeth 37 bit IN FALSE encoder.2.phase-A <== phase-a 37 bit IN FALSE encoder.2.phase-B 37 bit IN FALSE encoder.2.phase-Z 37 float OUT 18.4 encoder.2.position ==> spindpos 37 float OUT 18.40904 encoder.2.position-interpolated 37 float OUT -957.31 encoder.2.position-latched 37 float I/O 100 encoder.2.position-scale 37 s32 OUT 97571 encoder.2.rawcounts 37 bit IN FALSE encoder.2.reset 37 float OUT 2.738226 encoder.2.velocity 37 float OUT 164.2935 encoder.2.velocity-rpm 37 bit I/O TRUE encoder.2.x4-mode

Screenshot from 2023-09-09 20-43-22 Screenshot from 2023-09-09 20-44-24

samcoinc commented 11 months ago

I don't know if I should start a different issue - or add this here.. I think it also loses 'position..' over time..

I am running a program that is a bunch of chained G33 moves.. (kinda making a oil grove..)

It seems that the position slowly moves in relation to the actual index. Now - I know this works OK on mesa hardware as I have done some non-circular turning which can have many many rotations after an index is seen. (thousands) this starts losing after a few hundred. Could it be in the calculation? Or something I am doing - I am running the spindle slow <1rpm.. I don't think it is a counting problem - although I don't know how I would compare the 'counts' to the position - as there is a missing tooth :)

If you look at the screen shot - the mouse pointer is showing the position (look at the lower encoder 'position' value. You can see the mouse compared to the index is moving to the left. (slowly losing position..)

samcoinc commented 11 months ago

well shit - you can't see the mouse pointer. Let me try this again.

samcoinc commented 11 months ago

ok - I have the 'dot' for the encoder position trace showing where the position is compared to the gap. You can see it moving to the left. I really don't think I am losing counts.. But I don't know of a way of testing for that. Let me think..

Screenshot from 2023-09-16 20-42-29 Screenshot from 2023-09-16 20-43-50 Screenshot from 2023-09-16 20-46-29 Screenshot from 2023-09-16 20-48-24

samcoinc commented 11 months ago

Ok - I setup 2 encoders (one with scale of 99 - no missing tooth) - both encoder counters lose position the same.. I am pretty sure it is a hardware issue - losing a count every so often. Have not caught it yet... Disregard the above ramblings about losing position...

samcoinc commented 11 months ago

Actually - could the position calculation not be high enough precision? I am really having a hard time proving that this is a missing count.. If I run a threading operation - .1 pitch over 1 inch. I have never seen a loss of spindle position.. Not one tooth. But if I run a bunch of chained together G33's - the position slowly moves (as you see above in the screen shots) the higher the rotations goes..

andypugh commented 11 months ago

Look at the rawcounts at the index over several rotations?

samcoinc commented 11 months ago

I think I found the problem. Finally broke the scope out and set it to pulse triggering..

This could also be the issue with the index at slow speeds.. I will test. I may have been the one that caused this problem.. PXL_20230918_230840479

samcoinc commented 11 months ago

doesn't help low rpm gap detection though.. So the original issue is still an issue.

so far sam

andypugh commented 11 months ago

So, you have a spurious pulse on the line? I don't fully understand the logic for the missing tooth gap limit, though I imagine that I once did.

        /* decide how many ns to detect missing-pulse index */
        cntr->limit_dt *= 0.9;
        cntr->limit_dt += 0.1 * ((*(cntr->missing_teeth) + 0.5) * (delta_time / delta_counts));

Are you able to convert cntr->limit_dt to a hal parameter and plot what it is doing?

samcoinc commented 11 months ago

I filtered out the pulses - the encoder position now doesn't seem to drift. but at low rpm - it doesn't find the gap still..

I should be able to handle that... Lol.. Let me try.

samcoinc commented 11 months ago

well - that is an s32 - do we need to do something to it? The images (first is it not seeing the gap correctly - second it is)

Screenshot from 2023-09-19 08-08-13 Screenshot from 2023-09-19 08-27-00

samcoinc commented 11 months ago

It also doesn't seem to calculate the rpm/rps correctly too - you can see from the first batch of screen shots..

andypugh commented 11 months ago

limit_dt is the length of gap required to trigger the missing tooth. It's in nanoseconds. I was worrying that S32 might not go high enough, but it should be OK up to a pulse time of 2 seconds. It looks like max_dt is 18ms in the bottom image, what is the actual pulse gap?

samcoinc commented 11 months ago

Here is another screen shot - the index gap is about 21.5ms or there about... Screenshot from 2023-09-19 15-45-03

samcoinc commented 11 months ago

when you first launch linuxcnc - and run the spindle slow - (where it doesn't see the index properly.. You get a maxdt of 10.3ms.. but you can see the index gap is the same 21.5ms..

Screenshot from 2023-09-19 15-52-34

samcoinc commented 11 months ago

I honestly can't believe that I got the limit_dt as a pin.. It took 2 trires - looking at how '*' is used by other variables. (I don't even know if variables is the correct term..)

Also - when it isn't working - Remember that the velocity (rpm and rps) isn't correct...

sam

samcoinc commented 11 months ago

here is the rpm.. It is actually turning 43ish rpm.. You can see when it isn't seeing the index properly - the rpm is almost 2X Screenshot from 2023-09-19 18-01-53 Screenshot from 2023-09-19 18-02-56

andypugh commented 11 months ago

It's a bit hard to pick numbers off of the screen, but it rather looks like limit_dt is being set to less than the normal pulse gap, so every gap looks like a missing pulse. Is that what the numbers show?

samcoinc commented 11 months ago

That is how I would interpret it. And some how related to rpm/rps..

samcoinc commented 11 months ago

Andy! I am not in front of the lathe - but could it be as simple as my logic on the 'missing tooth' is wrong? You can see my 'missing tooth' is a logic 1 gap.. Is the code expecting a logic 0 gap? 'MISSING' tooth?

sam

samcoinc commented 11 months ago

alas - no. Wouldn't be that easy..
Screenshot from 2023-09-20 14-44-14

GTCLive commented 11 months ago

I have an omron 100ppr encoder hooked on the A, B and Z, shielded pure copper wiring, terminating resistors and schmitt triggers for the 3x signals. I not getting a single missed pulse (2.9-latest). I'd suggest to rule out the hardware you have on the benchtop first, then try to reproduce the fault in the system/hardware you are running.

What happen when you inject a pulse from a MCU or Function Generator to LinuxCNC?

samcoinc commented 11 months ago

I am 99.9% sure this isn't a noise issue.. I have dbounced the crap out of the signal - still acts the same.

here is something that may or may not be helpful.. If I do a step change from 43rpm to 430 rpm - the missing tooth index starts working...

If I slowly raise the rpm to 430 rpm - it keeps missing the gap (and reports the wrong rpm)

sam

GTCLive commented 11 months ago

If you are confident you are noise/glitch/spike/joint/pad/vibration free on the hardware side, time to get the Function Gen out and try injecting a pulse to the input to rule out your software-side and installation. If you don't have a Function Generator you can rig something quick using a STM32, Arduino or a ESP32 for the same purpose.

Really wish I'd be of any better help. Dislike seeing others having issues while it should be running fine for them too.

Good luck and all the best. ps. Can we see the hall sensor (datasheet/timing/specs) and the slot width?

samcoinc commented 11 months ago

Weeeee... I created a hal component that takes the sim encoder and creates a missing tooth phase.. (it isn't pretty...)

So - if this behaves the same as my physical encoder - I should be able to create a sim config.. If...

Screenshot from 2023-09-21 19-59-27

samcoinc commented 11 months ago

ok - I can get it to do it with a strictly hal implementation...

Screenshot from 2023-09-22 20-53-28

andypugh commented 11 months ago

I will try to get to this on Sunday.

samcoinc commented 11 months ago

Ok - here is the Hal file, Hal component and some directions..

Install encodergap.comp

put missingtooth.hal and autosave.halscope in the same directory..

Then run.. halrun -I missingtooth.hal setp sim-encoder.0.speed .6 setp encoder.0.index-enable true

You will see like the image below that the index isn't found at the gap..

linuxcncmissing.tar.gz Screenshot from 2023-09-23 09-52-38

samcoinc commented 11 months ago

If you then

setp sim-encoder.0.speed 2 setp encoder.0.index-enable true

It will start seeing the index as expected.. Screenshot from 2023-09-23 09-54-19

samcoinc commented 11 months ago

here is the encoder.c with limit_dt as a pin (if I did it right - seemed to work)

encoder.c.tar.gz

andypugh commented 11 months ago

Sorry, I should have updated you on this. I am pretty close to a fix. (Well, I have a fix but want to understand why it works)

samcoinc commented 11 months ago

Sorry, I should have updated you on this. I am pretty close to a fix. (Well, I have a fix but want to understand why it works)

Cool! I can certainly test it when you have a handle on it.

thanks!

andypugh commented 11 months ago

I am starting to doubt my sanity on this one.

The issue is the quick-and-dirty lowpass filter: val = 0.9 val += 0.1 new_val

https://github.com/LinuxCNC/linuxcnc/blob/master/src/hal/components/encoder.c#L485

If I delete the lowpass and just use the previous gap then it works, but I worry that is possibly unstable with glitchy real-world data.

At 100 teeth and 100 rpm I get a limit_dt of 9,000,000 without the filter With the filter it is about 4,500,000

So I tried only dividing by 10 (and by 10.0, and multiplying by 0.1) and get a value of 450,000

It's like /10 is being interpreted as /20...

And this is with (as far as I know) all the other interactions with the value if limit_dt disabled.

I could just remove the lowpass, but I want to understand.

andypugh commented 11 months ago

The penny has dropped..

If limit_dt is too small, then the counter sees a gap every time. And adds the missing teeth in. Which doubles the tooth count, halving the calculated value of limit_dt, keeping limit_dt too low... This also explains the hysteresis. I need to "prime" the limit_dt lowpass.

petterreinholdtsen commented 11 months ago

[andypugh]

I am starting to doubt my sanity on this one.

The issue is the quick-and-dirty lowpass filter: val = 0.9 val += 0.1 new_val

I suspect you want to work with floating point numbers while doing this calculation. The code in question will in effect do this:

hal_s32_t missing_teeth; hal_s32_t limit_dt; rtapi_s32 delta_counts; rtapi_u32 delta_time; limit_dt = 0.9; limit_dt += 0.1 (((missing_teeth) + 0.5) * (delta_time / delta_counts));

Several of the calculations are done using whole number types, and I suspect the result is not what you expect.

Perhaps something along these lines solve the problem:

hal_s32_t missing_teeth; hal_s32_t limit_dt; rtapi_s32 delta_counts; rtapi_u32 delta_time; / Use floating point calculations when filtering / double newval = limit_dt; newval = 0.9; newval += 0.1 (((missing_teeth) + 0.5) (1.0delta_time / delta_counts)); limit_dt = newval;

I believe this ensure floating point calculations are used when multiplying, adding and dividing.

-- Happy hacking Petter Reinholdtsen

petterreinholdtsen commented 10 months ago

[andypugh]

The penny has dropped..

If limit_dt is too small, then the counter sees a gap every time. And adds the missing teeth in. Which doubles the tooth count, halving the calculated value of limit_dt, keeping limit_dt too low... This also explains the hysteresis. I need to "prime" the limit_dt lowpass.

I suspect this is the integer math messing things up. I am just pulling numbers out of thin air here, but lets say that *missing_teeth = 1, limit_dt = 2, delta_counts = 3 and delta_time = 4. This calculation:

limit_dt = 0.9; limit_dt += 0.1 (((missing_teeth) + 0.5) (delta_time / delta_counts));

will then look like this:

limit_dt = 2 0 / (int)0.9 = 0 / limit dt = limit_dt + 0.1 (1 + 0.5) 1 / (int) 4/3 = 1 */

The value assigned is (int)0.15 which is zero, so limit_dt=0. While if double is used during calculation, the result will be limit_dt=2.

-- Happy hacking Petter Reinholdtsen

samcoinc commented 10 months ago

wgat

[andypugh] I am starting to doubt my sanity on this one. The issue is the quick-and-dirty lowpass filter: val = 0.9 val += 0.1 new_val I suspect you want to work with floating point numbers while doing this calculation. The code in question will in effect do this: hal_s32_t missing_teeth; hal_s32_t limit_dt; rtapi_s32 delta_counts; rtapi_u32 delta_time; limit_dt = 0.9; limit_dt += 0.1 (((missing_teeth) + 0.5) (delta_time / delta_counts)); Several of the calculations are done using whole number types, and I suspect the result is not what you expect. Perhaps something along these lines solve the problem: hal_s32_t missing_teeth; hal_s32_t limit_dt; rtapi_s32 delta_counts; rtapi_u32 delta_time; / Use floating point calculations when filtering / double newval = limit_dt; newval = 0.9; newval += 0.1 (((missing_teeth) + 0.5) (1.0*delta_time / delta_counts)); limit_dt = newval; I believe this ensure floating point calculations are used when multiplying, adding and dividing. -- Happy hacking Petter Reinholdtsen

I did this..

    if (delta_counts) {
          double newval = *cntr->limit_dt;
                  newval *= 0.9;
                  newval += 0.1 * ((*(cntr->missing_teeth) + 0.5) * (1.0*delta_time / delta_counts));
                  *(cntr->limit_dt) = newval;
    }

but it didn't seem to help...

andypugh commented 10 months ago

The numbers are so big that integer / float isn't super-important. limit_dt and delta_time are in nanoseconds.

andypugh commented 10 months ago

I think I have fixed it, can you test on hardware? I tested in sim between 1 rpm and 10,000 rpm.

samcoinc commented 10 months ago

I think I have fixed it, can you test on hardware? I tested in sim between 1 rpm and 10,000 rpm.

I will tonight!

samcoinc commented 10 months ago

Testing - so far so good!

Great work! sam