Closed OJthe123 closed 1 year ago
This is the interesting bit on line #80 Waiting for component 'inihal' to become ready......................................A configuration error is preventing LinuxCNC from starting.
It's really hard to debug without all the ini & hal files.
Thank you for your response @ozzyrob. It is indeed a configuration error, because the card does connect and buffers are being allocated and after all cleaned up.
Please share your ini and Hal files.
Sure. I should have attached the files...
The thing is, I didn't change anything. You say you only did bug fix for overflow in the stepgen and something for RIP install? So, that should have nothing to do with my hal and ini, or? Semse.hal.txt Semse.ini.txt
Let me check one thing. Did you come from the main branch or did you update from branch 11?
If you updated from branch 11: yes only the Rip and overflow have been changed.
If you come from the main branch: the config file should be updated and the firmware rebuilt.
Can you also share you're config json?
At this moment I'm on holiday, as soon as I'm back I will build your config and investigate.
Also if there are any logs available from LinuxCNC, please share. Somewhere I might have the feeling a pin has been renamed somewhere in the process. (yes, that's on me)
Yes. I come from the 11-add-externals. I used the new json format. It works fine. Only when I pull the last update I cannot start LinuxCNC
If you can describe your system, version of Linuxcnc, Debian version & kernel version, and your method of installing Linuxcnc I can try and replicate. I would also require your json file and the following mentioned in your ini file
HALFILE = custom.hal POSTGUI_HALFILE = postgui_call_list.hal SHUTDOWN = shutdown.hal
If that is ok with you Pete. I'm unemployed ATM and looking to my mind occupied.
All the information should be in the report I attached in the first post above. I cloned the repo with git clone --single-branch 11-external.... poetry Install poetry shell pip3 Install click pip3 Install yapps litexcnc install_driver
The custom and other hal Files are empty. It is a very basic LCNC setup just for testing
@ozzyrob: I'm completely okay with that when you can try and help others. On the other hand, hope your in a job soon again.
Cheers Pete
Yeah sorry mate, you're right, was first thing in the morning down under. Just still need the json file so I can build the firmware and flash the fpga.
Sorry. I am not at home till end of month. I used the example json. Just added the index pins to encoders. 3 pwm. 4 stepgens.
I'm going take a stab at this and suggest it could be a latency issue. Have you tried isolcpus=1 when booting ?
The reason I think this I tried a simple sample config that uses the hal_speaker component and as my kernel was non real time I was getting these errors.
waiting for s.joints<0>, s.kinematics_type<0> waiting for s.joints<0>, s.kinematics_type<0> waiting for s.joints<0>, s.kinematics_type<0> waiting for s.joints<0>, s.kinematics_type<0> waiting for s.joints<0>, s.kinematics_type<0> waiting for s.joints<0>, s.kinematics_type<0> USRMOT: ERROR: command timeout
It could be. But as I said, I did not change any other thing. I can start and use LinuxCNC with LiteX with no problems when I switch back to the "old" drivers
Can confirm main branch is ok, although I get Apply time exceeded limits with 2.9 least linuxcnc loads, haven't tested 2.8 yet.
With 11-add-external on Buster with Linuxcnc 2.8 & Bookworm with Linuxcnc 2.9 sometimes I'm seeing errors as mentioned by OP, sometimes the system is just "freezing".
When I am back at home I will try to go step by step through the components which causes my error
No probs mate, thanks for all your hard work.
Any thoughts on "Apply Time exceeds limits" on the main branch ?
The apply time exceeding limits can be due to:
I did not experience any lock up though on my machines. I'm running on a RPi 4 using isolcpus=1,2,3 for best latency results. Although isolcpus=2,3 also yields good results. Generally it is recommended (no source, this is from the top of my head) to isolate a pair.
My response time might be higher due to holiday. My apologies for any inconvenience.
No rush mate, it's all sweet & cruisy Down Under, no need to apologise. Enjoy your holiday. To be fair I'm testing on Lenovo T530 with a Dual core i5 (it's my favourite, but shhh don't tell my other computers ;) ). The real machine will be a PC with a quad core i5. I just use the laptop as I sit in the living room rather than isolated somewhere else.
So I'll do a bit more testing Tomorrow.
I also have plenty of time with this. My lathe don't need this Speed when overflow can occur. And I don't use a RIP. I am fine with the Version that works for me ππ
Back from my holiday and sadly: I can reproduce this bug. Starting my machine leads to freezing of LinuxCNC. Seems that some processes are not running any more.
Edit: I have to investigate it further. When disabling litexcnc
completely, it still fails to start.
In the LCnc Forum someone mentioned that this happens when he add stepgens to the Json config
At this moment I'm thinking my installation is completely broken:
Reinstalled LinuxCNC to no avail. Today I'm going to format the image to see whether I can create a working config again....
You think it's corrupting something in the actual Linuxcnc installation ?
https://www.dropbox.com/s/h1v0j1btdzi96ia/VID_20230804_093236.mp4?dl=0
Hey. I think it is time to show the world that this project is not only a bugs and feature request π
This is with 11-external before the last update
So 11-external was working ? Looking good. Have you got any more detail of the X axis ? I'm trying to come up with something simple for my Myford ML/S7 Frankenstein.
https://www.dropbox.com/s/e3lnlcgugkhil5h/IMG_20230804_094837.jpg?dl=0
https://www.dropbox.com/s/4x0orttlawtoq6c/IMG_20230804_094831.jpg?dl=0
https://www.dropbox.com/s/m911beann6je7yi/IMG_20230804_094825.jpg?dl=0
It is a jmc ihsv57 180w servo with 1204 kus spindle and a selfmade mounting plate
Cheers, nice solution.
OK rolled back 11-add-external-extensions-to-litexcnc to:
commit ba57141686940a113f1d2394c17f069025eb3770
Author: Peter van Tol <petervantol@gmail.com>
Date: Wed Jul 12 10:33:57 2023 +0200
pip vs. pip3
Was able to get the config running, but on a quad core Intel Core i5-3470 with 3 cores isolated I was stil getting
Litexcnc: Apply time exceeded limits.litexcnc: Apply time exceeded limits.
Apply time exceeding limits (too long): 69026366277, 69026365867, 69026405879
That was with only Linuxnc being run from a terminal. What should the watchdog be set at ?
@OJthe123 : how would you like the idea of crating a show your machine page in the documentation?
@ozzyrob : this is one of the problems which has been resolved between your rollback version and the current version. Because I'm experiencing the same problem (reinstalling atm), I hope the problem is fixed. Otherwise I'll make a patch for you.
@Peter-van-Tol : Sure, no problem. What do you need?
I also have the "Apply time.." info. But I cannot say that has any impact on my machine... But what I noticed, is that the calculated(?) encoder.velocity is 25% higher than the actual servo speed. I scale it down with the position-scale to fix it at the moment. Could be calculation error, or really the servo speed is off. I have no other possibility to measure it
Here are my machine files... semse.7z.zip
Spend today reinstalling LinuxCNC on my RaspberryPi, bu to no avail. Something has changed apparently and prevents the real-time components to start (i.e. emcTrajInit failed)...
edit: installed the following versions:
Both give the same error on my RPi, how is that possible?
Any luck yet ? Sounds like a real PITA. :(
Just for more success story π
G76 Threading cycle also works
@OJthe123 : Nice
In the meanwhile, my RPi is showing signs of life again, no errors when starting a simple configuration. Now going to install LitexCNC and re-build... The error was in the end PEBKAC (configuration error)
Recompiled everything and can reproduce the error with the following hal file
loadrt [KINS]KINEMATICS
loadrt [EMCMOT]EMCMOT servo_period_nsec=[EMCMOT]SERVO_PERIOD num_joints=[KINS]JOINTS
# Connection to the board
loadrt litexcnc extra_modules="toolerator" connections="eth:10.0.0.10"
# Assign to threads
# - LitexCNC
addf EMCO5.read servo-thread
# - MOTMOD
addf motion-command-handler servo-thread
addf motion-controller servo-thread
# - LitexCNC
# addf EMCO5.write servo-thread
The above hal file will run without errors. However, as soon as I enable the write function I get the error:
USRMOT: ERROR: command timeout
emcMotionInit: emcTrajInit failed
Waiting for component 'inihal' to become ready.
It boils down to something which must have changed in the write function which prevents the module from starting. Will further investigate where the write-function fails by shutting down all components and then re-enabling them one by one.
Edit 1:
Found the culprit in litexcnc.c
in the line:
static void litexcnc_write(void *void_litexcnc, long period) {
litexcnc_t *litexcnc = void_litexcnc;
// Check whether the write has been initialized AND the read and write functions
// are in the recommended order (first read, then write). In the first loop the
// we don't write any data to the FPGA, but it is configured. This is required,
// because the configuration requires the period to be known, which prevents the
// configuration to be performed before the HAL-loop starts
if (!litexcnc->write_loop_has_run) {
// Check whether the read cycle has been run, if not, the order is not correct
if (!litexcnc->read_loop_has_run) {
LITEXCNC_WARN("Read and write functions in incorrect order. Recommended order is read first, then write.\n", litexcnc->fpga->name);
}
// Configure the FPGA and set flag that the write function has been done once
litexcnc_config(void_litexcnc, period); // <== This line blocks the starting of the FPGA and the time-out
litexcnc->write_loop_has_run = true;
return;
}
Edit 2:
Found the misbehaving module: stepgen
. While determining the best pick-off (and thus the best accuracy) it gets into a infinite loop in some cases...
Was just going to mention that the code in litexcnc.c is the same in the commit ba57141686940a113f1d2394c17f069025eb3770. And that works apart from the apply time messages.
Took some effort, but have found the error. If you pull the latest version of the branch #11 your LinuxCNC should start up again.
Fantastic will try in the morning
Ok gave it a go, tried with the OP's configs.......damned if I could get rid of the following error. Latency is good, I can run a config using steppers with a 25us base thread on this machine. Ping times are good. Tried isolating various cores (4 core i5)
But after that whinge it does start up, just can't jog.
The following error should be gone by the latest commit. There was a difference between Python (firmware) and C (driver) in determining the pick-off. For slow movements this could be compensated by the PID or pos2vel
. However, for faster speeds the difference became to big.
The current commit has been tested on my EMCO5, which shows no following error when trying 1500 mm/min whilst using pos2vel
as translation between position and velocity.
Continuing from #29 ...
With the config and hal-files from @OJthe123 I can now re-produce the problem. The difference between my setup and his is mainly the scale. Now that has been sorted out, I can start debugging. Just want to close this issue in a proper manner...
I have suggested that the problem might be with using the pin position-feedback
instead of position-prediction
. At this moment this seems to resolve the problem in my set-up, at least for having a following error. However, both using PID
as well asnpos2vel
the machine starts to oscillate when the jogging stops.
My observations are:
PID
, these might be lessened by applying proper tuning;EDIT
Not committed yet, but I got a rock-solid version of pos2vel
working at this moment. It is more based on the way LinuxCNC stepgen behaves. Have the feeling that LinuxCNC is tuned on its own stepgen. Upcoming changes will be:
period-s
and period-s-recip
anymore (replaced by internal parameters). This makes the stepgen module less heavy on the processor, as some floating point arithmetic will be removed;pos2vel
will no longer be a separate module, as it will be included in stepgen
.stepgen
will have an additional parameter velocity-mode
. The default is position control (velocity-mode=false
), which drives the motor to a commanded position, subject to acceleration and velocity limits. Velocity control (velocity-mode=true
) drives the motor at a commanded speed, again subject to accel and velocity limits. NOTE: users who want to continue to use their tuned PID-setup must add setp [NAME].stepgen.##.velocity-mode 1
to their hal-files. EDIT 2
Finished the re-write of litexcnc_stepgen.c
. Tonight I will test this modification (it is a real big clean-up) with loads of enhancements. It does compile, but during the day no way to test it on my equipment.
Awesome work! Do you think it is better to use the pos2vel / position-control in general? I do not really have a tuned PID setup. It is more just a P setup for loop back to the FPGA .
I finished 400 little parts today, which I turned on my lathe. No single problem with LiteX and Colorlite board
Just committed the changesπ:
pos2vel
has been removed;stepgen
has now position mode (you can select the pin in HAL);For an advice on position vs pid: if your setup works, there is no need to change. However, the readability and maintainability of the HAL-file will improve when using the position control. The code below is the minimal example for a single axis, which is roughly 50% reduced in size when compared to a solution with pid
or pos2vel
.
STEPGEN - X-AXIS
########################################################################
# - Setup of timings
setp [LITEXCNC](NAME).stepgen.00.position-scale [JOINT_0]SCALE
setp [LITEXCNC](NAME).stepgen.00.steplen 5000
setp [LITEXCNC](NAME).stepgen.00.stepspace 5000
setp [LITEXCNC](NAME).stepgen.00.dir-hold-time 10000
setp [LITEXCNC](NAME).stepgen.00.dir-setup-time 10000
setp [LITEXCNC](NAME).stepgen.00.max-velocity [JOINT_0]MAX_VELOCITY
setp [LITEXCNC](NAME).stepgen.00.max-acceleration [JOINT_0]STEPGEN_MAXACCEL
# setp [LITEXCNC](NAME).stepgen.00.debug 1
# - Connect velocity command
net xpos_cmd joint.0.motor-pos-cmd => [LITEXCNC](NAME).stepgen.00.position-cmd
net xpos_cmd joint.0.motor-pos-fb <= [LITEXCNC](NAME).stepgen.00.position-prediction
# - enable the drive
net xenable joint.0.amp-enable-out => [LITEXCNC](NAME).stepgen.00.enable
I would really appreciate if you would test this latest version, so this issue can be closed as resolved.
You rock dude! changed the drivers and build new firmware. Tested my setup with 3000mm/min. No errors. Maybe it could be faster, but I did not want to kill my maschine in case of a bug in firmware or driver π
EDIT: just for those who will copy & paste, there are two typos. corrected below.
`
STEPGEN - X-AXIS
########################################################################
# - Setup of timings
setp [LITEXCNC](NAME).stepgen.00.position-scale [JOINT_0]STEP_SCALE # typo
setp [LITEXCNC](NAME).stepgen.00.steplen 5000
setp [LITEXCNC](NAME).stepgen.00.stepspace 5000
setp [LITEXCNC](NAME).stepgen.00.dir-hold-time 10000
setp [LITEXCNC](NAME).stepgen.00.dir-setup-time 10000
setp [LITEXCNC](NAME).stepgen.00.max-velocity [JOINT_0]MAX_VELOCITY
setp [LITEXCNC](NAME).stepgen.00.max-acceleration [JOINT_0]STEPGEN_MAXACCEL
# setp [LITEXCNC](NAME).stepgen.00.debug 1
# - Connect velocity command
net xpos_cmd joint.0.motor-pos-cmd => [LITEXCNC](NAME).stepgen.00.position-cmd
net xpos_fb joint.0.motor-pos-fb <= [LITEXCNC](NAME).stepgen.00.position-prediction # typo
# - enable the drive
net xenable joint.0.amp-enable-out => [LITEXCNC](NAME).stepgen.00.enable`
Sorry guys for being a bit quiet, Iβve been playing with some 7c81 firmware on a Spartan 6 dev board. When I get the chance Iβll setup my machine that I use for testing.
Really Happy Pete. I owe your at least a Beer Had a play around no following errors, running the sample code was passed, jogging via keyboard passed, MDI passed, only issue was on shutdown. Starting up again runs fine but still quits with the same message.
_Shutting down and cleaning up LinuxCNC... task: 48698 cycles, min=0.000007, max=0.097561, avg=0.009917, 0 latency excursions (> 10x expected cycle time of 0.010000s) litexcnc/Semse: Watchdog timeout not set. Using default value 0 ns (3 times period).litexcnc: LitexCNC etherbone driver unloaded rtapi_app: caught signal 11 - dumping core
can confirm... When I run from terminal I can see the same output....still can see no effects on the maschine...
Shutting down and cleaning up LinuxCNC...
Running HAL shutdown script
task: 603 cycles, min=0.000041, max=0.012258, avg=0.009716, 0 latency excursions (> 10x expected cycle time of 0.010000s)
mb2hal quit_signal DEBUG: signal [15] received
mb2hal quit_cleanup DEBUG: started
mb2hal quit_cleanup DEBUG: unloading HAL module [16] ret[0]
mb2hal quit_cleanup DEBUG: done OK
mb2hal main OK: going to exit!
litexcnc: LitexCNC etherbone driver unloaded
rtapi_app: caught signal 11 - dumping core
free(): invalid pointer
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
Waited 3 seconds for master. giving up.
Note: Using POSIX realtime
motmod: not loaded
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
Note: Using POSIX realtime
trivkins: not loaded
<commandline>:0: exit value: 255
<commandline>:0: rmmod failed, returned -1
<commandline>:0: unloadrt failed
Note: Using POSIX realtime
This error is due to an old loadrt
statement in your hal-files. You have now:
loadrt litexcnc
loadrt litexcnc_eth connection_string="192.168.178.150"
This should be combined to the following single statement:
loadrt litexcnc connection_string="eth:192.168.178.150"
Why this error emerges at this moment? It is because the FPGA is reset to its safe state when LinuxCNC is unloaded. This means that litexcnc
will send a last message to the FPGA. When the FPGA is loaded using two separate statements, the etherbone driver is already unloaded (and memory thus freed up). Thus writing to a closed device, without allocated memory leads to a core dump.
I will close this issue, as the original problem has been solved. In another issue I will unpublish the litexcnc_eth
component, so it cannot be inadvertently used as a stand-alone component.
@ozzyrob : for beer that would be then a VB please π» ...
But to be honest: the beer would be on me. Thank you for your support, testing and time spent to make this possible and closing this issue.
Hi. I did "git pull" to get the updated files from 11-add-external....
then installed the new drivers "litexcnc install_driver" rebuild the firmware.
Then LinuxCNC won't start anymore. linuxcnc-report.txt
when I do halrun and load the driver it has no errors loadrt litexcnc loadrt litexcnc_eth connection_string="192.168.178.150"
Do I have to change something in the INI for the new driver version?