IbcAlpha / IBC

Automation of Interactive Brokers TWS. You can download the latest release here: https://github.com/ibcalpha/ibc/releases/latest
GNU General Public License v3.0
1.07k stars 185 forks source link

Stop.bat is buggy #117

Open steel3d opened 3 years ago

steel3d commented 3 years ago

Hi! I'm running into a lot of instability with my watchdog script because IBC is not being shut down properly by Stop.bat. I'm working in Windows 10 on a fast machine.

A minor issue is that the timeout before "q" is too short, so the telnet window doesn't always close, stays up with this message:

OK Shutting down                                                                                                        

Connection to host lost.

I worked around it by adding an additional "q" at the end of SendStopCommand.vbs with a couple seconds delay.

The much worse problem is that if Stop.bat is called too soon after StartGateway.bat, IBC ends up in a broken state. The telnet window says the process shut down, but it didn't, and any subsequent calls to Stop.bat will fail with the telnet window stuck, blank, doesn't accept any input, only way to close it is to hard close it. Only way to stop IBC at this point is to kill it manually. Easy to repro. Manually run StartGateway.bat, then as quickly as possible manually run Stop.bat. Then try to run Stop.bat again.

I also saw this message one time in the telnet window: "ERROR null source ". Really depends on timing of when you run Stop vs Start.

I also don't know how to cleanly kill the stuck telnet app, because it was started by the batch file with "start", so killing the process of the batch file does nothing:

import subprocess import time p = subprocess.Popen("C:\IBC\Stop.bat") time.sleep(2) p.kill()

I could kill all "telnet" processes, but that's not clean.

It would really help me out if you can offer a solution to these issues. Thanks!

steel3d commented 3 years ago

Also, the Stop script types a "q" into whatever app happens to be in the foreground if the telnet app quits early for whatever reason. This is probably made more likely by my workaround (even if I keep a single "q" and only change the delay before "q" from 50ms to 5000ms.)

rlktradewright commented 3 years ago

Thanks for the info.

Before I spend time investigating this, can you please tell me what is the reason why you sometimes want to shut down IBC with a stop command immediately after you start it?

I suspect what's happening is that if you send the stop before TWS/Gateway has displayed the main window and built the menus, the stop task simply waits until that has happened (but I need to check the code to confirm this, which I'll do when I have some spare moments - rare things they are!). If that's the case, it should be easy enough for IBC to just exit without waiting for TWS/Gateway to complete its initialisation.

I've noticed the 'q' issue in the past, but have never tried hard to solve it, since it didn't seem to cause any problems and no-one has complained about it before. Telnet on Windows is such a pig that the best thing to do might be to write a tiny Java app that connects IBC and sends the STOP command (or any other command that IBC recognises).

steel3d commented 3 years ago

I don't really want to shut it down right after I start it, but I think multiple watchdogs may step on each other sometimes. I also feel like there are other ways for IBC to get into this unresponsive state (because I've seen it in cases where there is only one watchdog and it can't step on itself), but this is an easy to repro case and it might point to a more general issue. I often end up in situations where the IBC process can't be killed but it also doesn't have a working connection to IB, and new instances of IBC can't be started because there is an open handle on one of the files it needs, presumably form a zombie process.

I'm trying another strategy where I don't try to restart IBC every time IB loses connection, just leave it running all day. But this is not a good solution because if I ever log into the account manually, it will kick out IBGateway and it will never get restarted if I forget to restart the whole shebang.

Also, I think this type of instability might be affecting other frameworks that use IBC, like ib_insync. The connection died semi-permanently too often while I was using the ib_insync watchdog (that's why I'm trying to reimplement my own watchdog). Overall I think everyone would be better served if there were no race conditions.

Hope that gives more context on the issue. Thanks!

rlktradewright commented 3 years ago

Ok I've spent some time on this and there is definitely something wrong when you use Stop.bat before Gateway/TWS reaches a certain stage in its initialisation (and their initialisation sequence are quite different in significant ways). There are some aspects of what's happening that I don't quite understand yet.

I won't be able to get back to this until this evening (UK time), and it might take some time to get it sorted.

So in the meantime can I ask that you make some effort to prevent Stop,bat being called immediately after starting Gateway, because that is really an awkward situation. Bear in mind that if Stop,bat is called before IBC has managed to get its command server running (about a second or so on my server), it is guaranteed to do nothing, which your watchdog(s?) may not expect.

I'm also rapidly convincing myself that Windows telnet is really not up to the job, so I probably will implement a small command sender program (that could also be used as a library), but don't hold your breath!

By the way, what are these watchdogs? Why is there more than one of them? And why do you 'restart IBC every time IB loses connection'? (I presume you mean the Gateway/TWS's connection to the IB server?)

rlktradewright commented 3 years ago

I've fixed these problems and I hope all will be well now. I've created a new version 3.8.6-beta.2, which you can download from here:

https://1drv.ms/u/s!AlqfLEOWDJ9Zh8ckH_dReHyajw45KQ?e=iYXyxm

Please can you give this version a try, and let me know how it goes. If everything is ok I'll make an 'official' release.

Changes in this version:

I've just discovered that if you run Stop.bat before IBC has opened the command processor socket (ie within 20 seconds of that, so you can even run it up to about 19 seconds before starting IBC), IBC does shut down straight away but currently it writes a lot of identical error messages to the log file. It's too late to fix this now, but it would be worth you using the new version anyway and I'll try to fix this untidiness tomorrow.

steel3d commented 3 years ago

Stop doesn't stop at all now once IBG is logged in.

You can also try a script like this to test various corner cases:

import subprocess
import time
import random
from datetime import datetime
while True:
    subprocess.Popen("StartGateway_paper.bat /INLINE")
    time.sleep(random.randint(2,30))
    subprocess.Popen("StartGateway_live.bat /INLINE")
    time.sleep(random.randint(2,30))
    print("Stop_live ", datetime.now())
    subprocess.Popen("Stop_live.bat")
    time.sleep(random.randint(2,6))
    print("Stop_paper ", datetime.now())
    subprocess.Popen("Stop_paper.bat")
    time.sleep(random.randint(2,6))
rlktradewright commented 3 years ago

Oops! Late night working is never a good idea...

A new version 3.8.6-beta.3 is now at:

https://1drv.ms/u/s!AlqfLEOWDJ9Zh-At2bALrtTIH2Yv9w?e=CAq5Hn

Note that I've also modified the SendStopCommand.vbs to not send the EXIT command: IBC now always closes the connection when it receives a STOP command.

It might be worth pointing out that in some circumstance where STOP is invoked before TWS/Gateway has finished initialising, or is processing an asynchronous task such as overriding the API port setting, calling STOP causes an InterruptException that is logged by TWS, and the log entries appear in the IBC log. They are harmless and don't prevent the STOP being actioned.

Regarding your script, I'm not currently a Python speaker, but it certainly has its uses, so perhaps I should put some time into it...

rlktradewright commented 3 years ago

@steel3d

Any update on this?

steel3d commented 3 years ago

So sorry I got busy and couldn't get back to this. Maybe next week. Currently I have one out of two scripts turned off to keep it stable.

steel3d commented 3 years ago

Hi! Sorry for the late reply and thanks a lot for the fixes! The stop is reliable now. However, it doesn't work at all when run from Windows Task Scheduler. Try to add this to a .bat file and create a task and run it:

start C:\IBC\StartGateway.bat /INLINE
timeout /t 15
start C:\IBC\Stop.bat

You will notice the java process is still running after, as well as a conhost and a telnet process.

Running Stop.bat from the command line is able to kill the java process, Task Scheduler is really weird, it creates some crazy permissions, and in general it's not easy to kill processes started from it. Hope you're able to debug and find a workaround.

Note that running the batch file above from a command line shuts down all processes fine. So it's only an issue when it's run from Task Scheduler.

rlktradewright commented 3 years ago

I've fixed the problem with Task Scheduler.

The reason it failed is that the SendStopCommand.vbs script assumed that the telnet window created by Stop.bat was the active window, to which it sends the relevant keystrokes.

However when running from Task Scheduler, it makes sure that the windows created are not the active window, presumably to ensure that anything the user is doing is not hijacked because of a scheduled task running.

The solution was to amend the scripts to explicitly make the telnet window active.

So just download the updated Stop.bat and SendStopCommand.vbs files and all should be well.

I've updated the Windows release zip to contain the amended files.

steel3d commented 3 years ago

Hmm, I pulled the changes, but it still doesn't work for me. It only works if in Task Scheduler I Create Basic Task, because that shows the app windows exposed. However, as soon as I edit the task and change anything, like "Run when user is logged on or not", the windows go hidden, and Stop.bat doesn't stop the java process anymore. Are you able to repro?

rlktradewright commented 3 years ago

When you start a program using Task Scheduler with "Run whether user is logged on or not", it is started in session 0 (you can see this from the Session Id column in Task Manager). Programs in session 0 (typically services) can run a user interface, but are provided with no resources to actually make that user interface visible or to receive any input (for example you could actually run TWS or Gateway in this way, and they would work fine, but there's absolutely no way to access their GUIs.)

This is why you can't see the telnet window.

And since there is no way to get input to the telnet window, the .vbs script that tries to send keys to the telnet window simply cannot work.

So there is no possibility of the current Stop.bat working as a scheduled task with "Run when user is logged on or not".

Before I go any further, might I ask why you need to run Stop.bat as a scheduled task at all? The fact that it's a scheduled task implies that you know when you want to shut down TWS/Gateway and this doesn't change frequently - ie you're not using Stop.bat in a sort of tactical 'as and when' manner.

So why can't you just set the ClosedownAt setting in config.ini, which will do the same job much more neatly?

But if you have a genuine use-case for closing down on a scheduled basis and the ClosedownAt setting is not appropriate for some reason, it might be best to write a small command line utility that can set up a socket connection to IBC, read commands from StdIn and send them to IBC, and send the output to StdOut. It wouldn't be a big effort, and I probably should have done this years ago, but telnet has always just about been 'good enough'. Nevertheless, I'm reluctant to spend time on this unless it's really necessary - I have a busy life to live outside of IBC!

This would actually be a nice little community enhancement, if you or anyone else wants to do it, but please let me know first.

steel3d commented 3 years ago

I see. Yes in fact to keep IBG stable I sometimes have to shut it down explicitly. IBG sometimes gets into a state where I can connect to it but most operations fail, like getting account values. It seems to be running out of memory when too many messages get logged. Lots of stuff gets logged because I do a lot of connects and disconnects so I can use a single instance of ib_insync to keep tabs on two gateways at the same time. I could probably work around this, separate to two ib_insync instances to do less connect/disconnects, but I can't be totally sure that there's no other way for IB to get screwed up that might require a restart. I do need to check that IBG is up by connecting to it, I can't just do a simple timed startup and shutdown, because I sometimes manually log in with TWS, which kicks off the IBG connection, which shuts down IBG, and then it needs to get restarted. I don't want to forget one day.

For now I have worked around the lack of Stop.bat by remembering the java process that IBC starts, then terminating it when I need to by pid.

So it's not an urgent issue for me, would just be cleaner to have a working stop. No worries, don't do it just for me.

I might volunteer to work on it normally, but I have no experience in networking and I suck at Java, so I wouldn't be efficient at it...

steel3d commented 3 years ago

Maybe use netcat on Linux and something like this on Windows https://nmap.org/ncat/ to be able to feed messages from a file rather than stdin? Since the windows installation instructions on turning on telnet are not trivial anyway, this might not be an extra burden...

rlktradewright commented 3 years ago

There are several points to be addressed here:

  1. You don't need to use netcat on Linux - telnet works fine, since it takes input from stdin, and file contents can be piped into it. That's why I didn't bother to provide a stop.sh: I assumed Linux users would be able to work out what to do for themselves. Whereas for Windows, many/most users have never heard of telnet, and even if they had it's stupidly implemented so you have to employ the silly kludge that Stop.bat uses to get the job done: hence I felt it was worthwhile providing a script that the user could easily customise.

  2. ncat would certainly do the job on Windows, but when I tried to download the precompiled Windows binary, Edge informed me that it contained a virus, so I decided to leave well alone!

  3. I'm puzzled by your experience with the Gateway: I've never found that doing large numbers of API connects/disconnects causes any kind of instability with either Gateway or TWS. I'm not sure what you mean by 'when too many messages get logged' and 'Lots of stuff gets logged because I do a lot of connects and disconnects': what logging are you talking about? Given that any half-decent computer these days, even with a mechanical disk, can write many thousands of lines per second to a file, I don't see why logging would be a problem, or why it would cause memory shortage.

  4. If you want to run Gateway most of the time, but also want to occasionally use TWS and then ensure that Gateway carries on when you've finished with TWS, you can do this using Task Scheduler and the ExistingSessionDetectedAction setting in config.ini.

    Unfortunately I've found a bug in this area that might prevent this working properly, and I'll let you know when it's fixed.

    But what you have to do is first, run Gateway from Task Scheduler with the task set to repeat periodically (say every 15 minutes), and with ExistingSessionDetectedAction=primaryoverride. Then when you want to run TWS, you run it with ExistingSessionDetectedAction=primary (it will need its own config.ini): this will cause Gateway to shut down. If you're still using TWS when the Gateway task triggers again, Gateway will not succeed in establishing its session and will shut down again. When you finish using the TWS, Gateway will then start properly the next time its task is triggered.

rlktradewright commented 3 years ago

I've now fixed the bug I referred to in in point 4 of my most recent post above.

Note that the description of the ExistingSessionDetectedAction setting in config.ini has been refined somewhat and it's probably worth including the updated wording in your file.

steel3d commented 3 years ago

Thanks. I'm good for now with my workarounds. I'm using separate scripts per login now, so I don't need to connect/disconnect over and over, so IBG doesn't become unstable. Closing executables by pid is not the worst. But if stop.bat ever becomes fully functional, I can use it to clean up my code. Nothing urgent. Thanks for all your help!

dmytro-sheyko commented 3 years ago

Hi! Using telnet + cscript to stop IBC looks more tricky rather then reliable approach. What if telnet is not installed or available? Or there is another telnet.bat in %Path%, which behaves differently (e.g. calls putty or ssh)? What if another window pops up and therfore telnet becomes deactivated while cscript mimics user input? Why not use just simple java program like below (especially if IBC itself is java based)?

public class StopIBC {
    public static void main(String[] args) throws IOException {
        InetAddress host = InetAddress.getByName(System.getProperty("host", "localhost"));
        int port = Integer.getInteger("port", 7462);
        try (Socket socket = new Socket(host, port)) {
            try (OutputStream os = socket.getOutputStream()) {
                os.write("STOP\nEXIT\n".getBytes(StandardCharsets.US_ASCII));
            }
        }
        System.out.println("ok");
    }
}
rlktradewright commented 3 years ago

@dmytro-sheyko you're absolutely right that the current approach on Windows is a complete kludge , and I should have done something more appropriate about it 15 years ago! But the fact is that no-one ever seemed to use the STOP command back then, and I suspect that even now hardly anyone uses it on Windows, and the kludge does actually work, so I never gave it much serious thought. Though I suspect I've wasted much more time on dealing with it as it is than it would have taken to do it right!...

So thanks for your little program. There's a bit more to it than just the program though: the fact that it's Java means that we have to also provide a script to run it that will locate the correct Java to use, in exactly the same way as the IBC scripts, because the user may well not have a 'standard' Java installation (only the one that's installed with TWS/Gateway). I would envisage factoring out the Java-location code as a separate script, called from Stop.bat and StartIBC.bat.

There is also the question of whether to also provide a Linux script - probably not necessary since telnet is so much more usable on Linux, but it might save someone some time working it out for themselves.

Would you be interested in providing a PR to cover this and replace the existing mechanism?

dmytro-sheyko commented 3 years ago

@rlktradewright, I've created pull request, please review. I did not extract the Java-location code as a separate script, but hopefully what I did is good enough to start. Also I did not touch Linux scripts. Thanks.

steel3d commented 1 year ago

Hi, @rlktradewright! Any updates on Stop.bat on Windows? I still get into situations where Stop is not able to stop a running IBC process, telnet doesn't seem to connect.

rlktradewright commented 1 year ago

@steel3d

I wasn't aware that any update is needed. It works fine for me. Reading quickly through the above, I notice that I mentioned quite a few things that you need to take into account.

So can you please give me a description of what you're trying to do with Stop.bat and the circumstances when it apparently doesn't work.

rlktradewright commented 1 year ago

@steel3d

If, by highlighting those two posts above, you're trying to draw my attention to them, then you're wasting your time. The pull request was a non-starter for all the reasons detailed in my review of it.

And you haven't answered my question. As far as I'm concerned, Stop.bat works fine, so if you want me to 'fix' it you'll need to make clear what you think is wrong with it. Otherwise you're just wasting my time as well.

rlktradewright commented 1 year ago

@steel3d

Oops, please ignore my previous post. I was confused by an alert I received, though I can't seem to find any record of it now.

I'm concerned that you're still having problems with Stop.bat. If there is a bug in it, or in IBC, obviously I'm keen to fix it, but I don't have any evidence of a bug in either.

The most recent thing you said was:

I still get into situations where Stop is not able to stop a running IBC process, telnet doesn't seem to connect.

That seems to imply that it does work sometimes, at least. So it looks like your problem is probably environmental in some way, but I can't speculate what without further information.

However, here's a completely different approach. Below is a small Python script that does the job perfectly and can be run from Task Scheduler without the user being logged in, if you still want to be able to do this.

I am not a Pythonista, so it's been a bit of a learning curve to get this working, and it could probably be improved to make it more robust, but it seems to work well provided the command string is correct (so you can replace STOP with RESTART to do a restart).

You'll need to install Python on Windows (if you don't already have it), but that's a trivial downlad-and-install from https://www.python.org/downloads/windows/.

Once you've installed Python, create a Python script file called, for example, StopTWS.py, copy and paste the script below into it and save it in C:\IBC (unfortunately Github doesn't allow Python files to be attached to comments).

The command to run it is then simply:

py C:\IBC\StopTWS.py

Here's the script:

#!/usr/bin/env py

import socket
import time

TCP_IP = '10.252.0.7'    # IP address or name of computer running TWS
TCP_PORT = 7463          # CommandServerPort setting in config.ini:
BUFFER_SIZE = 1024

MESSAGE = b"STOP\n" # Command to send to IBC

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))

print("sending: ", MESSAGE)
s.send(MESSAGE)

data = s.recv(BUFFER_SIZE)
print("received data:", data)

time.sleep(1)

print("closing socket")
s.close()

print("exiting script")

I hope this will prove useful to you. I'll probably include it in some form in the User Guide and the Windows IBC download as a Stop.BAT replacement.

By the way, this script should also work on Linux, but I haven't tested this (I broke my Linux VM trying to get Python properly installed on it, so...).