Simulated SRX Mag Encoder Reset Problems

CrossTheRoadElec / Phoenix5-Examples

HERO C#, FRC C++/Java, future platforms.

116 stars 169 forks source link

Simulated SRX Mag Encoder Reset Problems #56

Open Crossle86 opened 2 years ago

Crossle86 commented 2 years ago

I am working on sim of SRX Mag Encoder using code based on your example: https://github.com/CrossTheRoadElec/Phoenix-Examples-Languages/blob/master/Java%20General/DifferentialDrive_Simulation/src/main/java/frc/robot/Robot.java

This seems to work until I reset the encoder. After a reset, if I do a get of the encoder count value every 20 ms and don't move the encoder position via simCollection.setQuadratureRawPosition(), the returned encoder count is zero (as expected) until about 100ms passes and then the get of encoder count returns the value that was the encoder count before the reset, but the sign is reversed. This suggests the integration done by simCollection.setQuadratureRawPosition is at work.

So something is not working with resetting the encoder simulation. To do a reset I am calling setSelectedSensorPosition(0).

JCaporuscio commented 2 years ago

Just a heads up that we've seen this issue, we're going to try and reproduce it and will follow up soon.

Crossle86 commented 2 years ago

Thanks for the update. I got anxious and posted it to Wpilib as you know.

CoryNessCTR commented 2 years ago

I'm unable to reproduce this, I made the following changes starting at line 117 of the Java general differential drive simulation example:


  @Override
  public void robotPeriodic() {
    /*
     * This will get the simulated sensor readings that we set
     * in the previous article while in simulation, but will use
     * real values on the robot itself.
     */
    m_odometry.update(m_pigeon.getRotation2d(),
                      nativeUnitsToDistanceMeters(m_leftDrive.getSelectedSensorPosition()),
                      nativeUnitsToDistanceMeters(m_rightDrive.getSelectedSensorPosition()));
    m_field.setRobotPose(m_odometry.getPoseMeters());

    System.out.println(m_rightDrive.getSelectedSensorPosition()); // Added this
  }

  @Override
  public void teleopInit() {
    m_rightDrive.setSelectedSensorPosition(4322); // Added this
  }

  @Override
  public void autonomousInit() { // Added this
    m_rightDrive.setSelectedSensorPosition(0); // Added this
  }

Doing so allows me to set the selected sensor position to nonzero on teleop init, and to 0 on autonomous init. When I teleop init, my print log shows it getting set to 4322, and when I autonomous init my print log shows it getting set to 0. I never see the selected sensor position change from what I set to.

@Crossle86 Could you share the project that's causing the behavior you're describing? There's probably something unique about it that's preventing us from easily reproducing what you're seeing.

Crossle86 commented 2 years ago

Before we get into our code, which is as you might expect, complicated, note that this issue is under simulation and I am using this code to drive the srx encoder during sim:

` public void initializeSim() { simCollection = talon.getSimCollection(); }

public void setSimValues(double position, double velocity)
{
    Util.consoleLog("%.3f, %.3f", position, velocity);

    simCollection.setQuadratureRawPosition((int) metersToTicks(position));
    simCollection.setQuadratureVelocity((int) velocityToTicks(velocity));
}

public void reset()
{
    talon.setSelectedSensorPosition(0);
}`

The position and velocity come from the wpilib DifferntialDrivetrainSim class. Again, works, resets but after some time the talon.getSelectedSensorPosition() values from before the reset come back with sign reversed.

I can provide our code, I would just need to write up a bit of an explanation of what you would see. We use a library with a wrapper class around the talon/srx encoder. Your example did not use the simcollection approach so I want to make sure we are on the same page.

CoryNessCTR commented 2 years ago

The code you provided looks reasonable and is similar to what we do in the example.

Your example did not use the simcollection approach so I want to make sure we are on the same page.

Understandable, I think we are on the same page, though. Since I was using the diff drive example, the WPIlib diff drive sim object provided values to the sim collection. I just used the setSelectedSensorPosition API since the sim collection was constantly set in sim periodic. The test I performed was done in simulation where I don't see the behavior you described.

You probably don't need to write up a large explanation of your codebase, since it's sim it should be easy to provide a series of steps I can do and see the behavior. For example, the set of steps you'd take to reproduce my experiment is as follows:

Clean checkout of Phoenix-Examples-Languages
Open Java General -> DifferentialDrive_Simulation Project
Replace code as appropriate in my previous comment
Simulate Project
Teleop-enable, drive around. Values printed are non-zero
Disable, values printed are stable
Auto-enable. Values printed are zero and stay zero

If I can follow your steps and reproduce the issue, I should be able to dive down and figure out what's going on, even with any wrapper classes you have.

Crossle86 commented 2 years ago

Ok, here goes: The library is here https://github.com/ORF-4450/RobotLib. This is an Eclipse project and posts the library to local cache with version name "local". This allows me to test with local robot program before turning the library into a release. We have a scheme for this local testing and then release with version number to the team at large. The class of interest is SRXMagneticEncoderRelative. Note that sometime ago we implemented our own scheme using a regular encoder drive by wpilib EncoderSim to substitute for the SRX during sim because we started with this before native SRX sim support was available. You will see this in the wrapper class. We can use either that older solution (which works fine) but wanted to switch to native SRX sim support because our older solution is convoluted.

The robot program is here: https://github.com/ORF-4450/Robot22B2. This is a VSCode project and is a full robot control program. We use encoders in our auto classes and typically run a distance, reset encoders, run a new distance, etc. What we see is we run the first distance, rotate, reset encoder to run second distance. However, since the encoder returns incorrect date shortly after reset, the auto routine drives quite incorrectly. The driving of the encoders is in the class DriveBase which has the simulationperiodic function. The auto routine I am using to test is called Bounce (selectable in the sim) and it uses a helper class called AutoDrive to drive a specified distance in a straight line. When you run the sim, our log file will be written to the logging directory in the project and this log file is where we see a lot of data about what is going on in the code. The simulationperiodic function records a lot of data about the encoders and navx and is probably overkill...a lot of it is left over from the conversion from SampleRobot style to Command style and implementation of simulation.

I am always hesitant to have people look at this since it seems overly complex but it is what it is :-)

CoryNessCTR commented 2 years ago

I think I see what's going on. There are two values that reset when you change from one part of an auto to another. The first is resetting selected sensor position to 0, done with setSelectedSensorPosition (intentional). The second is resetting the simulation value, done whenever you call driveSim.setPose(driveSim.getPose()); (I believe this is unintentional, though you commented its effect with // This has the effect of resetting encoder tracking in the driveSim.)

This has the effect of first setting the selected sensor position to 0 at the simulated nonzero sensor value. Then, when the simulated sensor value is reset to 0, it reflects in the selectedSensorPosition, jumping the same distance again, making it look like the position is set to its negative.

If the symptom I'm seeing is the same as yours, I fixed it by commenting out both instances of driveSim.setPose(driveSim.getPose()); in DriveBase.java.

Let me know if this makes sense or not. I don't believe our library is at fault in this case.

Crossle86 commented 2 years ago

I modified my project as you indicated. This does seem to mostly work and it took me a while to understand what is going on with the drivesim object and the encoder sim. It seems ok but in running the bounce program I observed several jumps in position of the robot on the field display on direction change. The path is correct but the jumps in position are not. They seem to occur at the points where I reset the encoders to begin driving a new segment. In studying the trace I can see there still seems to be a problem with resetting the encoders. I am attaching my trace file and will call your attention to an example of the reset problem.

At 07:52:55:355 there is a reset of the encoders. You can see that after the reset the left encoder is still returning its previous value while 13498 the right encoder returns zero. as the trace proceeds you can see that the first encoder.get() calls after reset at 07:52:55:411 return the spurious left value 13498 and right value 0. This condition continues until 07:52:55:448 when the left encoder returns -19 and the right 19. This small value is more what I expect, though both should be +. This difference in encoder values (I expect they should both be + and small as we have hardly moved) continues though the values begin to converge and eventually reach more or less synchronized values, which is what I expect as we are driving with the same voltage applied to both motors. The jumps in robot position seem to line up with the encoder resets as the auto program drives in a new direction and the unexpected values would seem to be the cause.

A related question: I am testing absolute mode with call to talon.getSensorCollection().getPulseWidthPosition() but it always returns zero.

Logging.txt

CoryNessCTR commented 2 years ago

after reset at 07:52:55:411 return the spurious left value 13498 and right value 0.

This can happen due to the latency we simulate in getting CAN frames. Since you're using the selectedSensor* API, you're subject to the up to latency that's present in Status 2, by default 20ms. When you reset, you call setSelectedSensorPosition with timeout, waiting up to 15ms for the device to respond to the setter, and then another 15ms sleep before moving on. If the device responds within 5ms, which is likely to happen especially in simulation, you still have a chance that the selectedSensorPosition may not be up to date with the last value that was present on the bus.

This condition continues until 07:52:55:448 when the left encoder returns -19 and the right 19

It looks like the reset occurs while the robot is slowing down from its AutoRotate, which would explain why one side drifts positive while the other drifts negative before being overcome by the next movement.

The jumps in robot position seem to line up with the encoder resets as the auto program drives in a new direction and the unexpected values would seem to be the cause.

I also don't see the jumping in my simulation at all after I commented out those two lines. I've attached a gif of running your bounce auto on my computer, maybe I'm missing something? BounceSim

A related question: I am testing absolute mode with call to talon.getSensorCollection().getPulseWidthPosition() but it always returns zero.

Talon's simulated Pulse Width Position also needs to be set in the sim collection, using simCollection.setPulseWidthPosition(). This is because you could be using any pulse width sensor in conjunction with a regular quadrature encoder, so we don't make any assumptions that the pulse width position changes with the quadrature position

Crossle86 commented 2 years ago

First off, thanks so much for spending the time to help me sort all this out. Your gif of the sim is what I would expect. Mine is different and I will send a visual of that later, but it would appear it is something local at this point.

I am going to be travelling soon and will have to drop all this after Friday for at least a week so will leave the absolute questions for the future.

Now I have some questions on the reset delay issue. So when I call rightEncoder.reset(15), this means wait up to 15ms for response from the reset API, but it could be quite shorter? I now expect this is the case. Is this possible delay separate from the 20ms thread that updates the encoder count? I think I assumed the 15ms delay was tied to the update delay and the two 15ms delays would take me past 20ms, but if the delays are two different things, then if the reset delay is short as you pointed out, then I could move on well before the 20ms passes. Is this correct? This would explain the results.

Looking at the timing in the log, 37ms or so passes before we see the encoders return something in an expected range and that would seem to confirm the two delays being separate.

I did these delays because we saw our physical robot twitch at resets and determined it was because of the bogus returns from the encoders and we were not waiting. So added the 15ms delay which seemed to fix that problem but it is seeming that this is not really working as I expected.

I have also seen the AutoDrive end a drive before moving because the target counts were smaller than the un-reset counts returned before the 20ms update. What is a good practice to avoid this issue?

Crossle86 commented 2 years ago

Ok, I think I see the jump issue. I feed my odometer class with a cumulative left/right distance as incrementally updated from encoder counts in the periodic function. The updated odometer pose drives the fieldsim display. When I reset encoders, I set the tracking value used to compute the incremental distance to zero assuming the encoder will also be zero. But if the encoder count is still high, this generates a large incorrect incremental amount.

You can see this at 07:52:55:346. The cumulative left distance (clc) is 5.087m at end of previous drive. On first periodic pass after reset at 07:52:55:411 with 13527 still being returned by left encoder, the clc is now 6.717m. This makes my robot jump up. When the encoder returns a valid value, the clc returns to 5.084, jumping the robot back to where it ended the previous drive.

BTW, how did you capture that gif of the robot?

CoryNessCTR commented 2 years ago

So when I call rightEncoder.reset(15), this means wait up to 15ms for response from the reset API, but it could be quite shorter?

Looking at your code, you call setSelectedSensorPosition(0, timeout) where timeout is set to 15 in the calling function. This means Phoenix will set the position to 0 and wait up to 15 ms for an acknowledgement from the device (sim or real), though it could be much shorter. Shortly after that, you Timer.delay for an additional 15 ms, resulting in a total delay of between 15ms and 30ms from when you set the sensor position.

However, since selectedSensorPosition comes from Status 2, it gets sent out every 20ms, so there is a 5ms period where you could get the old data. I made a diagram that should explain this better. If the diagram helps, we can even add it to our docs to help get the point across.

I did these delays because we saw our physical robot twitch at resets and determined it was because of the bogus returns from the encoders and we were not waiting.

This is exactly why we implemented the can latency in simulation. If you are able to manage the latency in simulation, you should be fine with the real-world latency that's present on a robot.

I have also seen the AutoDrive end a drive before moving because the target counts were smaller than the un-reset counts returned before the 20ms update. What is a good practice to avoid this issue?

In general, there's two solutions I recommend for this:

Wait 2x the frame period after setting sensor position (you can speed up the frame periods you care about during auto if the waiting is that critical)
Manage the offset in software, so you set the selected sensor position only on autonomous start and re-set the software offset between autos. This circumvents the CAN latency issue entirely

I set the tracking value used to compute the incremental distance to zero assuming the encoder will also be zero. But if the encoder count is still high, this generates a large incorrect incremental amount.

Yep, I think that's what was happening in the first place, I might have missed another instance that resets the simulated encoders and gotten lucky on all my simulations. In any case I think you have what you need to fix it yourself.

BTW, how did you capture that gif of the robot?

I used ScreenToGif

Diagram of CAN Latency:

Crossle86 commented 2 years ago

Still a bit confused. The wait on setselected is not waiting for the new value to be returned, correct? If I wait 15ms and it returns in 5ms, am I still waiting on a 20ms cycle to complete before the getselected is updated? I think it is the connection between the setselected timeout and the frame update period that I'm trying get clear. I understand the window of error, just want to correctly understand what the setselected timeout does.

CoryNessCTR commented 2 years ago

The wait on setselected is not waiting for the new value to be returned, correct?

Correct

If I wait 15ms and it returns in 5ms, am I still waiting on a 20ms cycle to complete before the getselected is updated?

I don't quite understand this. If you wait for 15ms, and it takes at least 5ms for the device to acknowledge, then I would expect getSelected to return an up-to-date value. However, most of the time the device will respond in less than 5ms, so it's a toss-up on if getSelected is up-to-date or not.

The setSelectedSensorPosition timeout specifies how long to wait for the device to acknowledge the set. This is because setSelectedSensorPosition sends a one-shot CAN frame. In order to know for sure that the device received and is applying the set, Phoenix may wait until the device acknowledges with another one-shot CAN frame. The timeout determines how long to wait (or if it should wait at all) for that acknowledge before error'ing out, and will typically be very fast but is dependent on bus utilization.

This is separate from the 20ms period that Status 2 is sent out at. Regardless of how often you set the selectedSensorPosition, Status 2 will be sent out at a steady 20ms rate, so the position will be updated every 20ms. getSelectedSensorPosition just returns the last value from Status 2.

This behavior may change in the future, if you'd prefer it to be something else let us know and we'll see what we can do in a future version of Phoenix.

Crossle86 commented 2 years ago

Ok. I think I have it. Your explanation of the disconnect between setselected and when getselected might reflect the new value would be a good addition to the documentation. I suspected this but wanted to be sure I was not making another assumption.

However, I am still wondering...the two resets, left and right, wait a minimum of 15ms apiece (the delay call), for a total wait of 30ms before the getselected to confirm the reset. It would sure seem that at least the left encoder would have updated on the 20ms cycle and should be zero. It seems strange that the left (earliest reset) is not yet zero and the right is zero. Further, given the min delay of 30ms in the reset code, why do we see the getselected value not reset for some 30+ ms after we move on from reset code. Hopefully I'm not chasing my tail here.

I am assuming the delay call does not stop the 20ms update thread.

CoryNessCTR commented 2 years ago

Correct me if I'm wrong, but the code that's performing the reset is this, isn't it?

    public void resetEncodersWithDelay()
    {
        Util.consoleLog();

        Util.consoleLog("at encoder reset lget=%d  rget=%d", leftEncoder.get(), rightEncoder.get());

        // Set encoders to update every 20ms just to make sure.
        rightEncoder.setStatusFramePeriod(20);
        leftEncoder.setStatusFramePeriod(20);

        // Reset encoders with 30ms delay before proceeding.
        int rightError = rightEncoder.reset(15);    // 15ms
        int leftError = leftEncoder.reset(15);      // 15ms

        lastLeftDist = lastRightDist = 0;

        // if (RobotBase.isSimulation()) driveSim.setPose(driveSim.getPose());   //odometer.getPoseMeters());

        Util.consoleLog("after reset lget=%d  rget=%d  lerr=%d  rerr=%d", leftEncoder.get(), rightEncoder.get(),
                        leftError, rightError);

        if (RobotBase.isSimulation())
            Util.consoleLog("after reset ldget=%d  rdget=%d", leftDummyEncoder.get(), rightDummyEncoder.get());
    }

In this code, we reset the right encoder first, and then the left. In the logs, we see that the right side reports 0, while the left side reports nonzero, which is what I'd expect since the left side only has 15ms to update while the right side has the full 30ms.

Further, given the min delay of 30ms in the reset code, why do we see the getselected value not reset for some 30+ ms after we move on from reset code. Hopefully I'm not chasing my tail here.

It looks like the last log statement that reported the stale data was at 07:52:55:427, 17ms after the reset call, along with the 15ms wait for a total of 32ms of stale data. I can look further into it, but it'll be lower priority unless I can replicate it under simpler circumstances, because there's a lot of variables at play. It may end up being an after championship thing.

I am assuming the delay call does not stop the 20ms update thread.

Correct, the simulated devices are handled entirely by Phoenix, so waiting on a robot thread won't hang them up.

Crossle86 commented 2 years ago

You are so right about right reset first. My bad. I always try to handle left side first (except for here) and wasn't really reading the code. I think I have it now and increased the timeout a bit and it is working correctly.

Thanks again for spending the time to help me sort this out.