Linesight-RL / linesight

AI Plays Trackmania with Reinforcement Learning
https://linesight-rl.github.io/linesight/build/html/
500 stars 43 forks source link

Stuck on Connecting to TMInterface0... #6

Closed NumseBacon closed 4 months ago

NumseBacon commented 1 year ago

As the title says when opening observe_manual_run_to_extract_checkpoints.py it just says Connecting to TMInterface0... and nothing happens. I have tried finishing the map etc, so I dont know what to do?

Also will there be a new update on the public repo? Would be cool to have the reward prediction

ausstein commented 1 year ago

Had the same problem, The tminterface api does not work with tminterface 2.0

From the documentation:

NOTE: This API is only working on TMInterface versions < 2.0.0. From version 2.0.0, TMInterface introduced a new plugin API with AngelScript. If you still wish to use this API, download the 1.4.3 version of TMInterface.

NumseBacon commented 1 year ago

Had the same problem, The tminterface api does not work with tminterface 2.0

From the documentation:

NOTE: This API is only working on TMInterface versions < 2.0.0. From version 2.0.0, TMInterface introduced a new plugin API with AngelScript. If you still wish to use this API, download the 1.4.3 version of TMInterface.

Yes thanks a lot! Time to train, but just keeps going behind start and getting stuck lol

pb4git commented 1 year ago

Hi,

I'll add a note in the Readme, stating that dependency to TMInterface's old version.

Just a fair warning: While we do intend at some point to open-up our code with all training hyperparameters so that the project may be used by the wider community, we are not there yet. We share this code as-is so that people can read the code, but we likely won't provide much support to make it run properly.

You're welcome to make it work alone, and we'll be happy to hear about it if you do :)

Regards,

NumseBacon commented 1 year ago

Hi,

I'll add a note in the Readme, stating that dependency to TMInterface's old version.

Just a fair warning: While we do intend at some point to open-up our code with all training hyperparameters so that the project may be used by the wider community, we are not there yet. We share this code as-is so that people can read the code, but we likely won't provide much support to make it run properly.

You're welcome to make it work alone, and we'll be happy to hear about it if you do :)

Regards,

Hey thanks for adding the note about TMInterface! I know you just said you won't provide Support, but any idea why it ran fine and then suddenly just only turns right and gets stuck on barrier? I will try figuring it out myself but might aswell take a chance before I destroy something myself 😉

ausstein commented 1 year ago

yeah took a bit of tinkering to get everything to work but it runs now for me. Made it roughly halfway around Map5 so far.

changing cutoff_rollout_if_no_vcp_passed_within_duration_ms to 30_000 in misc.py made it stop running into walls. Although it started crashing the walls again now :D

It always slowly learns not to crash the walls, gets a few very good runs, then crashes walls again for 10-15 iterations

NumseBacon commented 1 year ago

yeah took a bit of tinkering to get everything to work but it runs now for me. Made it roughly halfway around Map5 so far.

changing cutoff_rollout_if_no_vcp_passed_within_duration_ms to 30_000 in misc.py made it stop running into walls. Although it started crashing the walls again now :D

It always slowly learns not to crash the walls, gets a few very good runs, then crashes walls again for 10-15 iterations

Ah okay i already lowered mine a bit, but lowered it more now, but I am on a map I made myself. But Uhm where do you see iterations?

ausstein commented 1 year ago

WIth iterations i meant the number of rollouts. I just watch the game.

NumseBacon commented 1 year ago

Ah okay

NumseBacon commented 1 year ago

1 hour 45 min and just got first finish!

pb4git commented 1 year ago

Keep us posted, I'm curious to see where you'll get :D

NumseBacon commented 1 year ago

I will! I ended up changing some hyperparameters and rewards... soo gonna need to wait a long time. Biggest issue currently is the bot not being able to reach VCP due to braking all the time 😆

NumseBacon commented 1 year ago

Current bot PB: 1:42 1:35 1:24 1:12 1:04 My PB 0:32 yeah..

ausstein commented 1 year ago

I got 8/11 checkpoints on map 5, then hit the 5 min max. So did not get stuck at all Shortly after I had exploding gradients. Loss=nan and the model self destructed :(

Changed some params, right now my new training run reaches sometimes 5/11 checkpoints. but it is much quicker. roughly 1.5 times my PB so it should make it all the way once it does not get stuck. Also the driving looks much more promising this time.

NumseBacon commented 1 year ago

I got 8/11 checkpoints on map 5, then hit the 5 min max. So did not get stuck at all Shortly after I had exploding gradients. Loss=nan and the model self destructed :(

Changed some params, right now my new training run reaches sometimes 5/11 checkpoints. but it is much quicker. roughly 1.5 times my PB so it should make it all the way once it does not get stuck. Also the driving looks much more promising this time.

Model self destruct wow. You can Change the 5min max though? Also what params did you change, since its much quicker?

ausstein commented 1 year ago

Yeah, I could, but it only reached the 5-minute limit once, so I didn't find it necessary.

I reduced the learning rate to avoid exploding gradients. I think that helped the model getting slowly better with better lines but that's just a guess. I also made it so that memory_size_start_learn increases slowly over time from 15_000 to 100_000. Those were the two largest changes

ausstein commented 1 year ago

The new run is at 460_000 NMG right now, I think the other one self-destructed around 200_000 i think. So it is learning more slowly now.

ausstein commented 1 year ago

I also realized it thinks the arch of the start block is a barrier that has to be avoided, that is why it likes to crash the wall in the start :D

ausstein commented 1 year ago

Oh and one more thing, I removed the accelerate reward if it hits break at the same time

rollout_results["input_w"].append(misc.inputs[action_idx]["accelerate"] and not misc.inputs[action_idx]["brake"])

NumseBacon commented 1 year ago

Ooo sounds nice, might try that aswell. How did you make start_learn slowly increase? Yeah the braking when driving forward is quite annoying. I actually increased learning rate haha. Any idea how to stop it from crashing at the start? Also is it normal when doing eval it's just pressing forward till finish and nothing else? (my map allows that lol)

ausstein commented 1 year ago

How did you make start_learn slowly increase?

I added a variable max_memory_size_start_learn to misc.py Then i added

if misc.memory_size_start_learn < misc.max_memory_size_start_learn:
    misc.memory_size_start_learn+= misc.batch_size // 2 

after loss = trainer.train_on_batch(buffer, do_learn=True)

Any idea how to stop it from crashing at the start?

not really, I think it just needs more training to learn to distinguish arches from borders.

Also is it normal when doing eval it's just pressing forward till finish and nothing else?

Early on yes. "Just press forward" seems to be a good strategy to learn early and the model might plateau there for a while. Eventually it should learn more sophisticated strategies tho. If it doesn't, you might want to increase number_memories_generated_high_exploration_early_training and high_exploration_ratio

NumseBacon commented 1 year ago

@ausstein seems you know quite a lot, thanks for the help. Im gonna try to see if I can add a Reward for not touching the walls

ausstein commented 1 year ago

@NumseBacon I work as a computational physics researcher, which includes more and more machine learning. So I have some general knowledge and intuition. However, on this repository in particular I am mostly just guessing about what works or not based on intuition.

Let me know if you can design such a reward, that would certainly be very helpful! tminterface.structs.SceneVehicleCar has a property "has_any_lateral_contact" it can be accessed via tminterface.structs.SimStateData.scene_mobil maybe you can use that.

I think that should be accessible as last_known_simulation_state.scene_mobil.has_any_lateral_contact in tm_interface_manager.py but I'm not sure about that.

Good luck!

ausstein commented 1 year ago

Although adding that might make the AI even more afraid of crashing into the arch of the start block :D

NumseBacon commented 1 year ago

@NumseBacon I work as a computational physics researcher, which includes more and more machine learning. So I have some general knowledge and intuition. However, on this repository in particular I am mostly just guessing about what works or not based on intuition.

Let me know if you can design such a reward, that would certainly be very helpful! tminterface.structs.SceneVehicleCar has a property "has_any_lateral_contact" it can be accessed via tminterface.structs.SimStateData.scene_mobil maybe you can use that.

I think that should be accessible as last_known_simulation_state.scene_mobil.has_any_lateral_contact in tm_interface_manager.py but I'm not sure about that.

Good luck!

Seems like you might be able to do that too? 😂

NumseBacon commented 1 year ago

Might just try to make it so if it detects a sudden stop it will be negative rewarded?

ausstein commented 1 year ago

Seems like you might be able to do that too? 😂

It does not seem too hard at least, but it is not so much what I am after. My goal with this is to simply get it running as a Benchmark and then try other algorithms with as little human designed reward as possible :) Once I get a decent training run with the iqn I'll try hooking it up with muzero.

Might just try to make it so if it detects a sudden stop it will be negative rewarded?

That sounds like much more work compared to taking a value that's already there :D

NumseBacon commented 1 year ago

Seems like you might be able to do that too? 😂

It does not seem too hard at least, but it is not so much what I am after. My goal with this is to simply get it running as a Benchmark and then try other algorithms with as little human designed reward as possible :) Once I get a decent training run with the iqn I'll try hooking it up with muzero.

Might just try to make it so if it detects a sudden stop it will be negative rewarded?

That sounds like much more work compared to taking a value that's already there :D

Yeaaaah, you're way more advanced than me for sure. I have no idea how to use other algos etc. Yeah you might be right, but I'm not that good 😊

pb4git commented 1 year ago

If I may... I'd suggest trying to reproduce the results on map5 before adding new stuff. Who knows, you might find better hyperparameters than what we used ?

Also, there is a risk with reward shaping: are you sure the optimal trajectory doesn't include a wallbang?

NumseBacon commented 1 year ago

If I reproduce results somehow, and change/add something wouldn't it then take quite some time to adjust to that. And yeah 100% always hard to account for tradeoffs and balance it really well/optimizing and I'm still kinda new.

NumseBacon commented 1 year ago

@ausstein yeah yeah I'll try, not sure how to confirm if it's working but whatever. What does muzero reward by then?

ausstein commented 1 year ago

@pb4git

If I may... I'd suggest trying to reproduce the results on map5 before adding new stuff. Who knows, you might find better hyperparameters than what we used ?

for sure that's why I am trying to reproduce map5 right now.

Also, there is a risk with reward shaping: are you sure the optimal trajectory doesn't include a wallbang?

exactly the reason why I want to try muzero, without any designed reward. However, a wallbang would be a single frame walltouch, so probably not much negative reward

@NumseBacon

Adding a walltouch penalty should be very easy tho

in tm_interface_manager.py

add a line

"walltouches": [],

to the definition of rollout_results on line 141

then add a line somewhere around line 475 between the others

rollout_results["walltouches"].append(last_known_simulation_state.scene_mobil.has_any_lateral_contact)

add a variable to misc.py

penalty_for_walltouch

finally in buffer_management.py

add after line 55

reward -= np.sum(gammas[:j]) * misc.penalty_for_walltouch * np.array(rollout_results["walltouches"][i : i + j],dtype=float) * misc.ms_per_action

I am currently not at home, else I'd test those and upload them to my fork

What does muzero reward by then?

muzero predicts for every action how it would affect the final outcome and chooses the one with the best predicted final outcome. So in theory it only uses the finishing time. In practice, I think, it also needs something like meters traveled, otherwise it would never finish in the first place.

NumseBacon commented 1 year ago

Wow you really just did that, that's awesome I'm not home either so I can't test. Muzero does sound interesting

ausstein commented 1 year ago

spotted a mistake should be

reward -= np.sum(gammas[:j]* np.array(rollout_results["walltouches"][i : i + j],dtype=float) ) * misc.penalty_for_walltouch * misc.ms_per_action

NumseBacon commented 1 year ago

You missed a ( so its reward -= (

I have implemented some of your ideas now, will see how it goes.

ausstein commented 1 year ago

okay, let me know how it goes.

I, however, do agree with pb4git, that it is likely a better Idea to first try to replicate the benchmark results in some capacity.

NumseBacon commented 1 year ago

True, but at the same time spending time on that, just to spend even more time on another map?

ausstein commented 1 year ago

@NumseBacon welcome to coding :D if you skip these things, you tend to spend more time overall rather than less.

@pb4git

BTW i just wanted to say this repository is awesome! I have rarely seen such a clean code base for a work in progress project! It is very easy to read and understand what you are doing at every step. I have looked some time now for a video game project that I could try to throw muzero at. And Usually I gave up because familiarizing myself with the codebase was too tedious. This repository is the opposite!

NumseBacon commented 1 year ago

Yeah this repo is awesome

NumseBacon commented 1 year ago

Been driving into the wall right after start for 2 hours now... Getting a bit worried, however it started doing that before changing/add parameters

pb4git commented 1 year ago

Are you using the tensorboard interface to monitor progress?

NumseBacon commented 1 year ago

Yes?

NumseBacon commented 1 year ago

Ended up trying to let it run all night, but the capture crashed multiple times, as in the same error when you put another window on top of trackmania so I can't capture the window. But the trackmania window wasn't blocked so I'm guessing it somehow lost connection/crashed

ausstein commented 1 year ago

I did not have much more luck either. Ran for 1.4 mio frames still no finish. best I got was 9/11 cp. also on tensorboard seemed to stagnate after 300k frames trying different hyperparameters now.

Here are some of my thoughts: n_steps probably needs to be larger than 1 for any chance of something sucessfull, but the lr needs to be decreased for using larger n_steps The "max_memory_size_start_learn" was stupid, it doesn't do anything. I understand that better now reward_per_m_advanced_along_centerline probably should be bigger. a value of 1 means 0.036 reward per ms for a speed of 100 along the center line. I set it to 10 for now. I put buffer_test_ratio = 0.1 because otherwise you are just wasting training data I also increased the size of the NN but not sure if or how much that will help.

all the parameters I changed from the defaults (just for your info, not sure if they will work): running_speed =10 gamma = 0.95 number_memories_generated_high_exploration_early_training = 200_000 reward_per_ms_press_forward = 0.1 reward_per_m_advanced_along_centerline = 10 cutoff_rollout_if_no_vcp_passed_within_duration_ms = 30_000 memory_size_start_learn = 15_000 discard_non_greedy_actions_in_nsteps = True n_steps = 30 dense_hidden_dimension = 256 float_hidden_dim = 1024 learning_rate =4e-4 weight_decay = 4e-4 zone_centers_jitter = 2

NumseBacon commented 1 year ago

Thanks for info! Please let me know what happens. I thought Higher running speed wouldn't do anything? Maybe I just didn't notice a difference

ausstein commented 1 year ago

I sometimes have to change the runningspeed maually via the TMinterface gui for it to take affect, but then it works. Since the rl code pauses the engine to wait for the calculated action, it has limited effect tho. With running speed=10 I get a race_time_ratio of roughly 2.8. So the races run roughly 3 times real time. The milage you get from this will vary with your cpu and gpu. I have an rtx 4080 and 5800x3D

NumseBacon commented 1 year ago

4080 damn, my computer doesn't quite have that. And I only have 16GB. I think my ratio is really low, but will see later when I change.

ausstein commented 1 year ago

I always first set the runningspeed via TMinterface, then load the map, then start the script. In that order it usually works. Otherwise, sometimes the runningspeed is stuck at 1 or even at 0.

NumseBacon commented 1 year ago

Yeah I changed some code and fixed it pretty much

pb4git commented 1 year ago

If you close TMInterface while the game is paused, it will restart in that "paused" state, and you need to unpause it manually (once) for our code to work.

If the game is already unpaused, it is abnormal that you need to set the running speed yourself.

pb4git commented 1 year ago

We have trained with 32GB of RAM.

I have not tested a full training with 16GB, but you should be able to equal or come within a few tenths of the benchmark we gave. Maybe you can improve on our benchmark if you find better hyperparams :)