Properly terminating crappy's processes before starting crappy again.

LaboratoireMecaniqueLille / crappy

Command and Real-time Acquisition Parallelized in Python

https://crappy.readthedocs.io/en/stable/

GNU General Public License v2.0

78 stars 16 forks source link

Properly terminating crappy's processes before starting crappy again. #14

Closed Elbub closed 2 years ago

Elbub commented 2 years ago

Hi there everybody !

I discovered CRAPPy 6 weeks ago, as I began an internship to integrate it to an older python program managing a workbench. It's easy to use and has a much faster acquisition rate than we had before, so I can tell you did an amazing work.

However, I'm currently encountering two problems I don't know how to handle. In the application I'm working on, the user first encounters a window where they enter diverse parameters (material, title, rupture limit, etc.), then another window to set the generator paths*, and finally there's the main window. Then, when the test is done and the user exits, there's a possibility to start another test with the same parameters (or just exit). To be more precise, if the user didn't exit the program but choosed to start another test instead, the main function returns that : return main_function(*parameters) parameters being a list of the parameters entered previously. The first problem is that crappy can't stop properly through its blocks, but I saw there's already an issue on that subject. The second is that, when starting another test instead of exiting, even if crappy stopped previously (with ctrl-C), starting it again raises that error : File "C:\...\Python310\lib\multiprocessing\process.py", line 115, in start assert self._popen is None, 'cannot start a process twice' How can I kill thoses processes in order to be sure there's no conflict when crappy starts again ?

* : I could send you the code if you want. It may help people to have a graphical interface to select setpoint paths.

Edit : I'm using VSC to code. I kill crappy (ctrl-C) from VSC's terminal when crappy and the main window are active, and it only stops crappy.

WeisLeDocto commented 2 years ago

Hello Elbub,

Good to hear you're testing Crappy out !

First, I assume you're running Crappy on Windows, and I have to tell you that we don't have much experience with debugging on Windows. We mainly use Linux, both for development and for driving our test machines.

Then, you did not mention which version of Crappy you're currently using. Is it a release from PyPI (pip install crappy) or did you clone the repo ? If you cloned the repo, I would recommend that you use an older version of the repo (this one or older) as the current version is a bit buggy. For now I'm only ensuring that the releases are stable, not the GitHub repo. I'm aware it is definitely not a good practice, adding a dev branch to the repo is on my ToDo list.

Now regarding your issue, the problem is that what you're trying to do is not allowed in Python... Crappy blocks are actually processes, that are being started when crappy.start() is called. It is not possible to start them a second time, that's just how Python works. Read and run this code for a minimal example showing that it doesn't work.

Now, there are workarounds. I don't know what your main_function contains, but if the blocks are redefined at each call then it should run smoothly no matter the number of calls. Check and run this other code for a minimal example. Tell me if this approach would integrate nicely in your software.

Weis

Elbub commented 2 years ago

I'm indeed working on Windows. We're thinking about switching to Linux (probably XUbuntu) at least on the computer that's used to drive the bench, but for now I still have to deal with it. Plus, I totally realize that Crappy can't be used at its best since I saw that you can't renice processes properly on Windows, but I didn't see much more restrictions, so I guess it shouldn't change much, since Python is an interpreted language. Other specs :

Python 3.10.5
Crappy 1.5.9, installed with pip
coding on VSCode 1.69 with only Python (v2022.12.0) and Pylance (v2022.8.10) extensions

Reading what I wrote earlier, I see why it's not clear at all. What I meant is that, when the user chooses to do another test after finishing the previous one, the program doesn't exit. Instead, the main function return a call to itself with the âraùeters to conserve. Unless Python does weird unoptimized stuff (which wouldn't surprise me that much), that call is terminal recursive, so the previous main function shouldn't stay on the stack. Anyway, Crappy blocks are created and started in an independent function(starting_crappy()), so that shouldn't be a problem at all. Trying to use that starting_crappy() function, then using ctrl-C to stop it, then using it again from within the same main_function yields the same result. I even tried with a minimalist script : one window and one button that starts crappy when pressed. Pressed it, ctrl-C (not sure if it's really a SIGINT on windows, but I presume it's pretty much the same) to stop it, then pressed it again, and got still the same results. I don't understand why different calls to a function seem to try to start the same blocks/processes.

Edit : I just got an idea, but I really hope it isn't that : Python considers functions as objects. If it considers the variables referencing the blocks as attributes of the function-object, it must keep them bound and not really kill the processes. Just a thought, and I don't think it's really related, but I'll dig further monday (or maybe this weekend if it obsesses me too much).

WeisLeDocto commented 2 years ago

Well unfortunately for you (and for me as well...), there is a major difference between Linux and Windows when using Python's multiprocessing ! The main limitation in Windows being that processes cannot be "forked". You'll find info easily on internet if you're interested. It has a strong impact on the way the processes have to be instantiated and started in Windows vs Linux. The former maintainer of Crappy (hi @Dad0u) had by the way quite a hard time a few years ago simply getting Crappy to work in Windows, and we're avoiding as much as possible using Windows on our setups.

Based on your more precise description, I took some time to investigate and I was able to reproduce your problem and to solve it on my machine at least. The issue here was that the Block.instances WeakSet keeps track of all the blocks instantiated so far, and calling crappy.start() will basically iterate over the elements of Block.instances and try to start them all. Now, after the first call to crappy.start(), blocks might still be referenced in Block.instances (well they should theoretically be garbage collected at some point if everything goes well, but sometimes they don't). So to make sure the second call to crappy.start() doesn't try to start a remaining block from the first call, the safest option is to empty the Block.instances before instantiating the new blocks for the second call. Luckily, this situation had been foreseen by former developers who added the crappy.reset() method for emptying Block.instances. All you need to do is to call it right after crappy.start(), so it empties the WeakSet as soon as crappy.start() returns. Here's as simple example that gives a different behavior with and without crappy.reset(), at least on my machine.

For your information, I'm planning on checking the architecture of the Block (and improving it if needed) somewhere in the coming months, both with Linux and Windows. I still advise you to test your code out in Linux, you might experience improvements. I'm an aficionado of Xubuntu as well ;)

Please tell me if my answer solved your problem or if you're still facing issues. Since you told me that Block.instances seems to be empty after crappy.start() returns (which would be normal), there might be something else going on...

Weis

Elbub commented 2 years ago

Back again, but from Xubuntu, this time. We finally switched to it to try if it could solve the problem, and I've got good and bad news. But first things first, my new specs are quite the same, otherwise : Python 3.10.4, Crappy 1.5.9, same VSC stuff.

Good news :

crappy.reset() worked perfectly on windows ! I still had to stop Crappy with a Ctrl-C, but I could launch it again afterwards from the same window or on a new window from the same program.
Generator's natural end do stops properly all the other blocks.

Bad news :

Calling Crappy from another window now raises an error that'll be hard to circumvent. From what I've searched, it's a conflict between X11 (the windowing system Xubuntu uses) and Tk/Tcl (the GUI manager used to print the graph). It happens when Crappy stops, so it's not that important for people using Crappy just in a Python script, but people who want to have a noncoder-friendly interface will experience the same trouble I'm in : stopping Crappy makes the whole program crash. Here's what I get every time : [xcb] Unknown sequence number while processing queue [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called [xcb] Aborting, sorry about that. python3: ../../src/xcb_io.c:278: poll_for_event: Assertion '!xcb_xlib_threads_sequence_lost' failed. The best I found was that which states that Tcl/Tk processes can't get forked, but I only use one thread for everything but Crappy. The problem comes from the Grapher block, since removing that block prevents that error from appearing.
Creating a Tk window, closing it, then starting Crappy, stopping it and then opening another Tk window raises another error : XIO: fatal IO error 22 (Invalid argument) on X server ":0" after 216 requests (216 known processed) with 17 events remaining. What is peculiar here is that if I only try to open + close a Tk window and start + stop Crappy or start + stop Crappy and open + close a Tk window, no such error happen. Once again, it's linked to a conflict between X11 and multithreading. Similar errors (XIO: fatal IO error <other number>) seem to be related with matplotlib and Ubuntu, but not with specific backends : here with GTK, here with Qt.

"Huh ?" news :

I've got a start button and a stop button on my main window. The first starts crappy, the second stops it. On W11, while the grapher was on screen, I couldn't interact with the main window, hence why I had to use the Ctrl-C to stop Crappy. On Xubuntu, if there's no grapher, I still can't, but if there is one, I can interact with the other window and click on any button, and they work fine. I have no idea why I can do that. I thought that Python should wait for crappy.start() to return before letting the other window continue looping. It might mean that Ubuntu's Python interpreter tries to parallelize the main window's process, causing the first error. Not sure, so I think it's still worth mentioning.

I'm gonna continue investigating and I'll tell you what I find. I'm gonna try to start Crappy from a subprocess to see if it changes anything, too.

WeisLeDocto commented 2 years ago

Elbub,

Good to hear that you were able to solve part of your problem ! I think you now get it why we chose not to implement a graphical interface in Crappy...

I have already encountered the "bad news" as well, but they were never critical for my applications so I never tried to properly solve them. They seem to be more related to Ubuntu or Xubuntu itself than to Python anyway. You could try running your application on the Gnome desktop environment, it's super easy to install now that you have Xubuntu (see this link). It may solve the xcb-related errors. You could also try adding the following lines :

import matplotlib
matplotlib.use("TkAgg")

at the beginning of you main script to set the backend from the start. It sometimes got me out of messy situations with conflicting backends (that especially happens when starting scripts from an IDE instead of a console).

The "huh news" might be related to the interactive mode of Matplotlib being turned on or not, or maybe it's again a matter of backend. I'm really not sure. I would have expected Tkinter's mainloop to be insensitive to the other running processes though.

Using subprocesses is a good idea, I use this solution on a server that allows me to start and stop Crappy scripts remotely. I find it a nice way to ensure that Crappy doesn't interfere with my server. Now I never tried to start and stop the processes from a GUI, it might get messier.

If you happen to find solutions to the GUI mess I would be very interested to hear about them.

Weis

Dad0u commented 2 years ago

Hello Elbub, i am pleased to learn that Crappy is useful to your research.

Sorry for tuning in this late, i am now working on several other projects.

Sadly Crappy, was not really intended to be used with GUIs but rather to provide a framework to "program" test benches. That being said, the writing of GUI applications is absolutely possible with a few workarounds. A great example is the braking tribometer bench. I too faced issues with the spawning of the processes in applications using Tk (see this lonely question for example) and learned the hard way that bad things happen when messing with processes from GUIs. I never got my head around the exact reasons, but the takeaway is that the more isolated GUIs and Crappy are, the better.

Now for your problem : from what i understand you can now stop and restart Crappy as expected using crappy.reset(). This is great, crappy.reset was exactly intended for this purpose, even though it was not tested thoroughly.

Now, to start the experiment through a GUI, i am guessing that crappy.start is called from a Tkinter callback. If so, i was able to replicate the issue in this short example. This program crashes with the following message :

[xcb] Unknown sequence number while processing queue
[xcb] You called XInitThreads, this is not your fault
[xcb] Aborting, sorry about that.
python: xcb_io.c:278: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
[1]    27652 IOT instruction (core dumped)  python test-callback.py

However, i managed to circumvent the issue by removing all calls to Crappy methods from the callback. This short code demonstrates a way to run several times a Crappy program from a Tkinter window. I achieved it by calling crappy.start from a thread, the Tkinter callback simply sets an event to signal the thread to start the program. Once the program is over (after 5s in the example), Crappy can be restarted by pressing start again. I could only test this on Linux, let me know if this architecture works on your machine.

I hope this approach may help you find a solution for your program. If not, you may provide us with a short demonstrator, to help us replicate the issue.

Victor Couty

Elbub commented 2 years ago

I think you now get it why we chose not to implement a graphical interface in Crappy...

Totally ! But still, I'm gonna try, and your comments are very helpful. Thanks a lot for that.

I had already tried matplotlib.use("Agg") to no avail, and I now tried other backends ("TkAgg", "qtAgg" and "gtkAgg") with no better results. However, the start-from-thread solutions did work a lot better. I didn't know about Event, since I didn't spend that much time digging through the multiprocessing lib. Now, I don't get any xcb error and I can properly interact with my window when Crappy stops. I can also use the "start another test instead of quitting" feature without a problem (I got a little one that I got rid of when I realized I didn't need to start crappy more than once per test). I'm gonna try to build another block using tkinter to serve as indicators during the test (current setpoint, max load achieved, max and min positions, etc.). I think it's going to be easier than trying to send back those data to the main window. Another way to do that would be to build a copy of my main window around the figure of a CustomGrapher block. More precisely, to both build the window and incorporate the figure in the prepare() method of such a block. Edit : I forgot to mention that I'm going to tinker with a custom generator too to put stop conditions (to avoid buckling or other problems), but I don't know if it may help you.

I'll keep working on it and see if I can send you the code. For now, it's a bit long (around 3000 lines) and some of it is still messy, since I've been working on an existing code that I'm upgrading and polishing, and I don't always follow all Python decoration rules.

Aside that, there's still something that I'd like to do, but I didn't tinker with it much : having Crappy record from the start of the program, then adding blocks and links (like a generator) later, when the user press a button. This would mean being able to record manual driving (with the remote of the bench, like to pre-strech the ropes that will be tested) and automatic driving (the main test) on the same document. Is that even doable ? I should maybe put that on a separate issue, but it's more of a simple question, so tell me if you'd rather have me create another one.

WeisLeDocto commented 2 years ago

Elbub,

It's good news that things are moving forward with your project.

I'm gonna try to build another block using tkinter to serve as indicators during the test

Maybe you can base this block on the existing Dashboard block, that's also using Tkinter and is meant to display text in a GUI.

I forgot to mention that I'm going to tinker with a custom generator too

I'm not sure to understand what you have in mind with this custom Generator. Could you maybe me more explicit ?

Regarding the possibility to add more blocks after crappy.start() was called, it wouldn't work without in-depth modifications to the base Block. Instead, you can probably achieve the desired behavior in a reasonably simple way using the existing framework.

I would suggest instantiating and starting all the blocks, and putting the Generator driving the bench on hold as long as you're still in the manual driving phase. Once you want to switch to the automatic driving phase, you could use something like the -ill named- GUI block to make the Generator continue normally. As per how you can put a Generator on hold, the easiest way is to make it wait for a condition that's only satisfied once you click on the button of the GUI block. You may also want to catch any command sent by the Generator block as long as the button wasn't clicked, by sending them to a middleman block rather than directly to the block driving the bench. This middleman block would receive inputs from the Generator and GUI blocks, and take the decision to send or not the commands to the block driving the bench. Would this kind of solution work in your case ?

Weis

Elbub commented 2 years ago

Maybe you can base this block on the existing Dashboard block, that's also using Tkinter and is meant to display text in a GUI.

To be honest, I had completely forgotten about that one. This should do the trick, with a modifier to filter. Might be a bit too big, so I may take that and modify it a bit.

About the generator, there's three things that I'd like to modify :

When a condition is reached, due to elasticity, inertia or whatever else, the setpoint is generally quite different from the actual value. The problem is that it may cause those conditions to be exceeded. Thus, I'd like to make it so that, if I have a ramp with a condition, then another ramp, the second one starts at that condition (or equivalent), not at the value at which the previous ramp stopped.
We need limits, but the safest way to implement them would be to do so in the generator, so it would stop if one of those limits is reached.
We got different PID (controller) values depending on if the path is a ramp, a constant, if it's used for fatigue, etc. When switching from a path to the next, I'd like the generator the send a signal to the PIDs so that they configure themselves according to the type of the next path. Not really sure about that one, since it's not that high on my priority list. We'll try to circumvent it some other way if it's not done through that way.

The solution you suggest about starting Crappy from start is an excellent idea ! I'm gonna look further in that direction, but I think we're gonna have a little problem with one thing : we have the main generator, driving the test, but we'd like to keep the functions that pre-stretch the rope sample and one that brings the trolley back to its starting position. I guess I'm gonna have to tinker something with three generators, a kind of multiplex, modifiers and starting buttons that are disabled while one of the generators is active. Apart from that, I'm pretty sure it would solve our recording issues while not auto-driving the bench.

WeisLeDocto commented 2 years ago

Well given what you want to achieve, I don't think you'll have to go as far as coding a new Generator block. Here's why :

Normally if nothing is specified, the starting point of the next ramp is the last sent command. So it has nothing to do with the last recorded value, that may indeed overshoot. If you want to adjust the starting point of the ramp, in Crappy up to v1.5.9 you can set it by providing the 'cmd': <start_value> keyword in the dict of the path. Note that the key will be changed to 'init_value' in the future versions. Investigating the reason why there's so much overshooting in Crappy is in my ToDo list by the way.
You can check the -ill named again- Protection Generator path that does exactly what you need for managing limits from a Generator.
The Generator sends the command under the 'cmd' label, but also the index of the current path under the 'cycle' label (the label can be changed by setting the argument cycle_label of the Generator). This index is incremented each time the Generator switches to the next path. What you could do is pass the list of Generator paths to the block managing the PID, so that from this list and the path id received from the Generator it can set the correct PID values (passing a copy.deepcopy of the list would actually be safer depending on what you do with it).

I'm not sure to completely get it, why couldn't you just append or not the pre-stretch and release phases to the list of Generator path, depending for example on the values of checkboxes ?

Weis

Elbub commented 2 years ago

Your two first points are good ideas, but I think it wouldn't fit in what I've already done. As of now, the user chooses if they want to drive the bench in load or position, then they are switched to a window where they can see all the paths, suppress or modify each one and add another anywhere between already registered one, at start or at end. Each path's add button shows a pop-up where the user can choose its parameters, then appends that path to the list at the previously chosen position. Thus, setting a "cmd" path argument would need to look if there's a previous path, check its condition if existing, and, if it's a delay, calculate at how much it should end, and then repeat that process to update the next path in list, if there is one. It's doable, but probably longer than tinkering a new generator. About your second point, I had already checked that path type, but it doesn't seem to be able to deliver a ramp. Plus, our conditions are in load and in position. I thought about it over the week end, and using modifiers is probably the easiest way.

Your idea about the PID, however, seems excellent. I'll dig further, but I think it's exactly what we need.

WeisLeDocto commented 2 years ago

Aaah ok, I hadn't got it that your goal was to code an interface displaying the shape of all the successive paths. Could be a nice contribution to Crappy by the way. Just saying.

To sum this issue up, you first had a bug raising the error 'cannot start a process twice', that was solved by calling crappy.reset() before restarting a test. Then, a new error popped involving [xcb] Unknown sequence number while processing queue, that was solved by separating the method that calls Crappy from the Tkinter callbacks. And finally, we discussed about technical solutions for your specific application.

So, I think this issue can now be closed ! The Discussions channel of GitHub is the perfect media if you would have other questions relative to the right Blocks/solutions to use in your specific projects. Don't hesitate to ask, might be helpful to other users.

Weis