Concept inquiry - Githubissues

Sentdex / pygta5

Explorations of Using Python to play Grand Theft Auto 5.

MIT License

3.92k stars 823 forks source link

Concept inquiry #34

Closed juanluishortelano closed 7 years ago

juanluishortelano commented 7 years ago

Hi,

This is not a pure code issue, i wonder something conceptual.

You state in the channel that "Charles only sees exactly what you see" and that is only partially true. We humans can infer 3D information from 2D images by imagining it based on our experience, Charles can't do that and therefore it will only achieve limited ability while driving.

I think it would be more interesting to give it a set of sensors that a Tesla would have for example and then let it operate in that kind of input, instead of just the images alone.

p.s. Awesome project anyway!

codesidian commented 7 years ago

Charles does see what we see, but doesn't process it the same. It could in theory process it like us but then it'd probably take over the world. As of now, charles doesn't mess with the game. It just watches and then makes decisions of what it sees, like how we would drive in GTA. I think (i might be wrong) the plan with this project is to avoid delving into gta itself, as much as possible. That way Charles is free the play other games too with not much effort. Preferably a game with more water. I think a tesla sensor inspired setup could be implemented. Multiple camera angles. That way if charles could see what's the right, left, behind, and infront at the same time he should be able to make better decisions. However, that's also quadruple the amount of frames to be processed. I like your idea better.

Orinion commented 7 years ago

It's true that Teslas autopilot and Human drivers have multiple cameras and bc of that 3d vision. However when you play a videogame, you will also only have 2d vision and no problems with driving. Also it would be hard to generate multiple pictures from gta each frame, instead have a look at this

Sentdex commented 7 years ago

Yeah, I am undecided on this and haven't seen any research to suggest that we see the pseudo 3D in games any different than AI would.

In real life, to actually render distance, you do need 2 cameras, but actually a single camera via some algorithms can do a decent job of general distance detection. Further, some models have even been trained JUST for this purpose (detecting distance in images) and they perform quite well. I really would be surprised if the convnet in this case doesn't have some sort of crude depth perception modeled. It clearly doesn't swerve to miss things far away, it waits til they're closer. Maybe it only memorized specific sizes, no idea.

Anyway, I am going to close this, feel free to re-share updates you find on this topic though.