boustrophedon / pgtemp

Rust library and daemon for easily starting postgres databases per-test without Docker
MIT License
219 stars 5 forks source link

Shutdown (without persist) hangs in Github actions #9

Open nihohit opened 1 month ago

nihohit commented 1 month ago

I can't repro this locally, so I don't know what the source of the issue is. The temporary DB, created with PgTempDB::async_new().await (so persist isn't set), hangs for minutes after a call to shutdown. Maybe there are some existing connections to the DB, but according to the documentation, this shouldn't block shutdown when persist isn't set.

This happens on a runner running Ubuntu noble, with postgresql-16 installed.

boustrophedon commented 1 month ago

Could you post a code sample? I briefly looked at the docs and it says "SIGKILL kills the postgres process without letting it relay the signal to its subprocesses, so it might be necessary to kill the individual subprocesses by hand as well." which I didn't know - maybe in my testing I only ever tried to kill it before doing work and the subprocesses were never spawned.

nihohit commented 1 month ago

This sample completes ok locally, but in my Github Actions it hangs indefinitely after printing "closing temp DB"

#[tokio::test]
async fn pgtemp_test() {
    use pgtemp::PgTempDB;
    use sqlx::postgres::PgPoolOptions;

    let temp_db = PgTempDB::async_new().await;
    println!("got  temp DB");
    let address = format!(
        "postgresql://{}:{}@localhost:{}",
        temp_db.db_user(),
        temp_db.db_pass(),
        temp_db.db_port()
    );

    let pool = PgPoolOptions::new()
        .connect(&address)
        .await
        .expect("Can't connect to postgres");

    let conn = pool.acquire().await.unwrap();
    drop(conn);

    println!("starting teardown");
    pool.close().await;
    println!("pool2: size: {} idle: {}", pool.size(), pool.num_idle());
    drop(pool);
    println!("closing temp DB");
    drop(temp_db);
    println!("teardown complete");
}
boustrophedon commented 1 month ago

Per #10 the issue is I was dumb with how I designed the shutdown function - probably what happened was that I wrote shutdown as fn shutdown(mut self) but then I wanted to use it inside drop, which is implemented with fn drop(&mut self) so I thought "oh I'll just change the signature of shutdown". However, it's obvious in hindsight that shutdown will then be called twice.

There are a lot of options here to fix this and pretty much all of them are breaking changes (which tbh isn't that big a deal since it's not like I have a ton of users). Probably simplest solution (and also doesn't change the API) is the one you provided but give me a bit to think about it.

nihohit commented 1 month ago

No hurry, but the PR is a separate issue - I see in github that shutdown never completes on the first call.

On Thu, 11 Jul 2024, 0:20 Harry Stern, @.***> wrote:

Per #10 https://github.com/boustrophedon/pgtemp/pull/10 the issue is I was dumb with how I designed the shutdown function - probably what happened was that I wrote shutdown as fn shutdown(mut self) but then I wanted to use it inside drop, which is implemented with fn drop(&mut self) so I thought "oh I'll just change the signature of shutdown". However, it's obvious in hindsight that shutdown will then be called twice.

There are a lot of options here to fix this and pretty much all of them are breaking changes (which tbh isn't that big a deal since it's not like I have a ton of users). Probably simplest solution (and also doesn't change the API) is the one you provided but give me a bit to think about it.

— Reply to this email directly, view it on GitHub https://github.com/boustrophedon/pgtemp/issues/9#issuecomment-2221503424, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDEBBRYVD3XG7ABXU4BBWLZLWQSJAVCNFSM6AAAAABKSVYIEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGUYDGNBSGQ . You are receiving this because you authored the thread.Message ID: @.***>

boustrophedon commented 1 month ago

So the PR doesn't solve the hanging issue?

nihohit commented 1 month ago

nope, it solves a different issue.

boustrophedon commented 2 weeks ago

Taking another look at this I wonder if the older version of pg in CI is different than what we have locally and maybe the behavior on sigkill is different, and a newer pg does better cleanup?

I can't get this to race locally with 1 or 100 connections.

nihohit commented 2 weeks ago

I don't think it's a version mismatch - checked this with CI & local versions both 16.3. It could be an OS difference - CI is Ubuntu, local is MacOS.

boustrophedon commented 2 weeks ago

I'm also on linux with 16.3. If you run the test in release mode does it hang in CI still?

nihohit commented 2 weeks ago

Yup.

On Sun, 18 Aug 2024, 2:17 Harry Stern, @.***> wrote:

I'm also on linux with 16.3. If you run the test in release mode does it hang in CI still?

— Reply to this email directly, view it on GitHub https://github.com/boustrophedon/pgtemp/issues/9#issuecomment-2295023315, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDEBBTDS5JIQ4MVSKBW5CDZR7KZ7AVCNFSM6AAAAABKSVYIEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGAZDGMZRGU . You are receiving this because you authored the thread.Message ID: @.***>