alttpo / sni

SNES Interface with gRPC API
MIT License
46 stars 11 forks source link

SNI failing to init luabridge driver results in RA driver never being set up #27

Closed super-continent closed 1 year ago

super-continent commented 1 year ago

This is a problem for people who don't care about using the luabridge and want to play through Retroarch, if possible it would be good to allow disabling luabridge altogether. For luabridge users it's probably best to also resolve #21

JamesDunne commented 1 year ago

Could you share a sni log that demonstrates the issue? The two drivers are independent of one another as far as I can tell so one should not affect the other.

super-continent commented 1 year ago

here is the log output:

2023/02/24 07:28:56.295647 Read env var SNI_FXPAKPRO_DISABLE: not found; defaulting to '0'
2023/02/24 07:28:56.295647 Read env var SNI_DEBUG: not found; defaulting to '0'
2023/02/24 07:28:56.296181 Read env var SNI_EMUNW_DISABLE: not found; defaulting to '0'
2023/02/24 07:28:56.296181 Read env var NWA_PORT_RANGE: not found; defaulting to '0xbeef'
2023/02/24 07:28:56.296693 Read env var NWA_DISABLE_OLD_RANGE: not found; defaulting to '0'
2023/02/24 07:28:56.296719 Read env var SNI_EMUNW_HOSTS: not found; defaulting to 'localhost:48879,localhost:48880,localhost:48881,localhost:48882,localhost:48883,localhost:48884,localhost:48885,localhost:48886,localhost:48887,localhost:48888,localhost:65400,localhost:65401,localhost:65402,localhost:65403,localhost:65404,localhost:65405,localhost:65406,localhost:65407,localhost:65408,localhost:65409'
2023/02/24 07:28:56.310224 Read env var SNI_EMUNW_DETECT_LOG: not found; defaulting to '0'
2023/02/24 07:28:56.310737 Read env var SNI_LUABRIDGE_LISTEN_HOST: not found; defaulting to '0.0.0.0'
2023/02/24 07:28:56.310764 Read env var SNI_LUABRIDGE_LISTEN_PORT: not found; defaulting to '65398'
2023/02/24 07:28:56.310764 Read env var SNI_RETROARCH_DISABLE: not found; defaulting to '0'
2023/02/24 07:28:56.310764 Read env var SNI_RETROARCH_HOSTS: not found; defaulting to 'localhost:55355'
2023/02/24 07:28:56.311306 Read env var SNI_RETROARCH_DETECT_LOG: not found; defaulting to '0'
2023/02/24 07:28:56.311306 Read env var SNI_MOCK_ENABLE: not found; defaulting to '0'
2023/02/24 07:28:56.311306 Read env var SNI_GRPC_LISTEN_HOST: not found; defaulting to '0.0.0.0'
2023/02/24 07:28:56.311843 Read env var SNI_USB2SNES_DISABLE: not found; defaulting to '0'
2023/02/24 07:28:56.311843 Read env var SNI_USB2SNES_LISTEN_ADDRS: not found; defaulting to '0.0.0.0:23074,0.0.0.0:8080'
2023/02/24 07:28:56.311843 Read env var SNI_GRPC_LISTEN_PORT: not found; defaulting to '8191'
2023/02/24 07:28:56.311843 Read env var SNI_GRPCWEB_LISTEN_PORT: not found; defaulting to '8190'
2023/02/24 07:28:56.312349 luabridge: could not start server on 0.0.0.0:65398; error: listen tcp 0.0.0.0:65398: bind: An attempt was made to access a socket in a way forbidden by its access permissions.
2023/02/24 07:28:56.313640 usb2snes: listening on 0.0.0.0:8080
2023/02/24 07:28:56.313640 usb2snes: listening on 0.0.0.0:23074
2023/02/24 07:28:56.313640 grpcweb: listening on 0.0.0.0:8190
2023/02/24 07:28:56.313640 grpc: listening on 0.0.0.0:8191

the reason this is a problem is because in the luabridge DriverInit function there is a blocking loop to initialize the lua server that will never end unless the server initializes successfully.

annotated here:

func DriverInit() {
    bindHost = env.GetOrDefault("SNI_LUABRIDGE_LISTEN_HOST", "0.0.0.0")
    bindPort = env.GetOrDefault("SNI_LUABRIDGE_LISTEN_PORT", "65398")
    bindHostPort = net.JoinHostPort(bindHost, bindPort)

    driver = &Driver{}
    driver.devicesMap = make(map[string]*Device)

    go func() {
        count := 0

        // attempt to start the luabridge server:
        for { // <- This loop never ends unless broken out of
            err := driver.StartServer()
            if err == nil { // <- only breaks if this condition is met, impossible to meet if server cannot listen on port
                break
            }

            if count == 0 {
                log.Printf("luabridge: could not start server on %s; error: %v\n", bindHostPort, err)
            }
            count++
            if count >= 30 { // <- this `count` variable never gets used, maybe a break after 30 tries that never got implemented?
                count = 0
            }

            time.Sleep(time.Second)
        }

        // finally register the driver:
        devices.Register(driverName, driver)
    }()
}

in the main function there is a series of calls to each DriverInit, this is where it gets stuck:

    // explicitly initialize all the drivers:
    fxpakpro.DriverInit()
    emunwa.DriverInit()
    luabridge.DriverInit() // <- infinite loop happens inside this one
    retroarch.DriverInit() // <- unreachable if luabridge cant initialize!
    mock.DriverInit()
super-continent commented 1 year ago

this issue is odd to me because it seems like this function is contained in a goroutine which (i believe, as a person who does not use Go) should mean it returns without blocking, but it doesnt do that, and i can resolve the problem by making sure luabridge initializes correctly by setting it to a free port on my system. which tells me its definitely still blocking somehow. maybe because it is locking the Driver object to register itself? i dont see any calls to a lock function or anything like that, but i dont know how go mutexes work, so maybe its implicit

super-continent commented 1 year ago

doing more reading on the code here and i can't figure out what was happening before, i'm trying to recreate the bug now and failing. my environment hasn't changed at all since i opened this issue but something managed to unbreak the RA connection, maybe this means theres some kind of bug in the RA driver that happens under a really specific condition? i'm really confused now because yesterday the bug was behaving exactly in the way i described and seemed to only go away once i had fixed the luabridge issue. i suppose i'll close it for now and add info if anything else pops up and i see some kind of lead to an issue

JamesDunne commented 1 year ago

Just looking at two lines of the log above rules out the luabridge DriverInit function blocking anything.

2023/02/24 07:28:56.312349 luabridge: could not start server on 0.0.0.0:65398; error: listen tcp 0.0.0.0:65398: bind: An attempt was made to access a socket in a way forbidden by its access permissions.
2023/02/24 07:28:56.313640 usb2snes: listening on 0.0.0.0:8080

These two lines are within 2 milliseconds of each other. The usb2snes web server is initialized after the device drivers are initialized (luabridge and retroarch). retroarch driver doesn't log anything in its DriverInit function during normal operation so the log is silent there.

    // explicitly initialize all the drivers:
    fxpakpro.DriverInit()
    emunwa.DriverInit()
    luabridge.DriverInit()
    retroarch.DriverInit()
    mock.DriverInit()

    // start the servers:
    grpcimpl.StartGrpcServer()
    usb2snes.StartHttpServer() // <- we wouldnt see usb2snes log messages if this didnt run

    // start up a systray:
    tray.CreateSystray()

Try to set environment variable SNI_RETROARCH_DETECT_LOG=1 which would enable more diagnostic logging for the retroarch driver. In addition you could try SNI_DEBUG=1 to get more detailed logging and identify any issues.

super-continent commented 1 year ago

will add those to my environment, if it happens again i'll have logs for it