fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
536 stars 209 forks source link

feat: handle gateway having funds when leaving fed #4433

Open kernelkind opened 2 months ago

kernelkind commented 2 months ago

When a gateway sends the command to leave a federation without the forced flag, it will receive an error GatewayError::UnexpectedState. When a gateway sends the command to leave the federation with the forced flag, it will be able to leave and no error will be returned.

Closes: https://github.com/fedimint/fedimint/issues/3935

elsirion commented 2 months ago

We should be able to override this. We sometimes needed to leave+rejoin as a quick fix to some bug. Since we keep the client directory around there is little risk for loss of funds.

justinmoon commented 2 months ago

Before merging this we should make sure that gateway-cli leave-federation is updated to support this change. Might be good to have coverage for this command in devimint tests.

m1sterc001guy commented 2 months ago

Might be good to have coverage for this command in devimint tests.

It would be good to also add a rust test for this.

justinmoon commented 2 months ago

Another idea: what happens if gateway only has dust? I think we should still be able to leave, otherwise they'd need to deposit and withdraw to get back to zero balance?

kernelkind commented 2 months ago

I've got a bunch of changes to add for this MR including a devimint test and backend test, but I've got a question before I push any changes.

https://github.com/fedimint/fedimint/blob/c1cb0e6207779dd8441f4c3798caa6471e2b15ec/gateway/ln-gateway/src/rpc/rpc_server.rs#L182-L189

I see that if I were to throw an error in handle_leave_federation, the client has no way of receiving it, since the router panics on error. I need to somehow let the client know that the request to leave the federation was denied (because the client didn't use the --forced flag & has remaining funds).

There are a couple ways I'm thinking of making changes. Here's the first:

async fn leave_fed(
    Extension(mut gateway): Extension<Gateway>,
    Json(payload): Json<LeaveFedPayload>,
) -> Result<impl IntoResponse, GatewayError> {
    let fed = gateway.handle_leave_federation(payload).await;
    Ok(Json(json!(fed.ok())))
}

Return an Option<FederationInfo> instead of returning FederationInfo. This way, if the client sees that the result is None, it knows the federation didn't leave successfully. This isn't a great implementation in my opinion, the client doesn't see what error was returned.

Another, could be: In gateway/ln-gateway/src/rpc/rpc_server.rs:

async fn leave_fed(
    Extension(mut gateway): Extension<Gateway>,
    Json(payload): Json<LeaveFedPayload>,
) -> Result<impl IntoResponse, GatewayError> {
    let fed = gateway.handle_leave_federation(payload).await;
    let res = match fed {
        Ok(fed) => FederationLeaveResult { info: Some(fed), error: None },
        Err(e) => FederationLeaveResult { info: None, error: Some(e.to_string()) },
    };
    Ok(Json(json!(res)))
}

and in gateway/ln-gateway/src/rpc/mod.rs:

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct FederationLeaveResult {
    pub info: Option<crate::FederationInfo>,
    pub error: Option<String>,
}

Let me know your thoughts on how the server can inform the client that leaving was unsuccessful.

m1sterc001guy commented 2 months ago

I see that if I were to throw an error in handle_leave_federation, the client has no way of receiving it, since the router panics on error.

I'm confused, where are you seeing this? The gateway should not panic on any error, if it does, that's a bug.

GatewayError is mapped to a StatusCode in IntoResponse

impl IntoResponse for GatewayError {
    fn into_response(self) -> Response {
        // For privacy reasons, we do not return too many details about the failure of
        // the request back to the client to prevent malicious clients from
        // deducing state about the gateway/lightning node.
        let (error_message, status_code) = match self {
            GatewayError::OutgoingPaymentError(_) => (
                "Error while paying lightning invoice. Outgoing contract will be refunded."
                    .to_string(),
                StatusCode::BAD_REQUEST,
            ),
            GatewayError::Disconnected => (
                "The gateway is disconnected from the Lightning Node".to_string(),
                StatusCode::NOT_FOUND,
            ),
            _ => (
                "An internal gateway error occurred".to_string(),
                StatusCode::INTERNAL_SERVER_ERROR,
            ),
        };
        let mut err = Cow::<'static, str>::Owned(error_message).into_response();
        *err.status_mut() = status_code;
        err
    }
}

You should be able to just return a GatewayError and check for that on the client side.

kernelkind commented 2 months ago

I'm confused, where are you seeing this? The gateway should not panic on any error, if it does, that's a bug.

Ah I was mistaken. I saw this line: https://github.com/fedimint/fedimint/blob/348c4d3a7a56c2e89236383664d609e8f3ef3399/gateway/ln-gateway/src/rpc/rpc_server.rs#L70

And the implementation of the route method:

    pub fn route(mut self, path: &str, method_router: MethodRouter<S, B>) -> Self {
        panic_on_err!(self.path_router.route(path, method_router));
        self
    }

and assumed that was causing the panicking in my tests. I see now that's not the case

kernelkind commented 2 months ago

I can remove the scripts/tests/backend-leave-fed-test.sh and scripts/tests/gateway-leave-fed-test.sh, I just used them for easier testing

kernelkind commented 2 months ago

Another idea: what happens if gateway only has dust? I think we should still be able to leave, otherwise they'd need to deposit and withdraw to get back to zero balance?

In this situation, the gateway would run the command gateway-cli leave-fed --federation-id X --forced

m1sterc001guy commented 2 months ago

I think you need to make the forced flag optional in the CLI, otherwise the backwards compatibility tests will fail. Otherwise its looking good to me.

kernelkind commented 2 months ago

I made it optional, but the backwards compatibility test still fails. It's weird, why would it fail before supplying the --forced=true flag? I would understand if it failed because the test was supplying the forced flag to an older version of the gateway-cli that didn't support it

m1sterc001guy commented 2 months ago

Looks like the endpoint is returning null instead of FederationInfo like expected.

00:00:11 Error: error decoding response body: invalid type: null, expected struct FederationInfo at line 1 column 4
00:00:11 
00:00:11 Caused by:
00:00:11     invalid type: null, expected struct FederationInfo at line 1 column 4

The logs in CI only show the output of the devimint test, but not the logs of each daemon (like the gateway). I'd recommend running the test locally or trying to repro locally. Something is probably wrong in handle_leave_federation

kernelkind commented 2 months ago

A less elegant solution that I think wouldn't break the backwards compatibility tests is to revert the LeaveFed command and create a new one called LeaveFedForced. And LeaveFedPayload would be reverted and a new payload LeaveFedForcedPayload would also be created. Would this be acceptable?

m1sterc001guy commented 2 months ago

Can you try to make force_leave in LeaveFedPayload an Option<bool> instead of just bool? I believe Deserialize will default to None if no value is present in the serialized struct, which should fix the backwards compatibility issues. A value of None can just be interpreted as false.

kernelkind commented 2 months ago

This method in v0.2.1 returns Result<()> instead of Result<FederationInfo>: https://github.com/fedimint/fedimint/blob/a8422b84102ab5fc768307215d5b20d807143f27/gateway/ln-gateway/src/lib.rs#L874-L885

so in the test gateway_leave_fed_test FM: current CLI: current GW: v0.2.1, the v0.2.1 rpc_server for gateway sends Ok(()) https://github.com/fedimint/fedimint/blob/a8422b84102ab5fc768307215d5b20d807143f27/gateway/ln-gateway/src/rpc/rpc_server.rs#L143-L149

But current rpc_client leave_federation expects a GatewayRpcResult<FederationInfo>, not GatewayRpcResult<()>:

https://github.com/fedimint/fedimint/blob/580a95b1f24b248c3f75429c992416f082a79098/gateway/ln-gateway/src/rpc/rpc_client.rs#L83-L89

So now the error makes sense: 00:00:15 invalid type: null, expected struct FederationInfo at line 1 column 4

How should I proceed?

m1sterc001guy commented 2 months ago

@okjodom added versioning for the gateway API awhile back. @okjodom could you advise on how to modify leave_federation so it is backwards compatible?

okjodom commented 2 months ago

@okjodom added versioning for the gateway API awhile back. @okjodom could you advise on how to modify leave_federation so it is backwards compatible?

I suppose we can cherry pick https://github.com/fedimint/fedimint/pull/4275/commits/4aa64957d9faf0fe204f7d36cd3f44a0d4151284 to v0.2.1 so the whole test matrix is compatible. Otherwise we should only consider v0.2.2 and later, in the tests covering this api

[EDIT] adding more context, the commit above breaks leave_federation API in order to test it. The tests are independent of the broader changes in the PR this commit was part of. Ref: https://github.com/fedimint/fedimint/pull/4275

kernelkind commented 2 months ago

If we wanted to use v0.2.2 it looks like 4aa6495 would need to be chery-picked here as well since it doesn't have the Result<FederationInfo> return type for handle_leave_federation etc. It doesn't matter to me which we use, but it would be nice to get this PR across the finish line at some point

elsirion commented 1 month ago

If we wanted to use v0.2.2 it looks like 4aa6495 would need to be chery-picked here as well

Not worth it, let's only run this for current and future versions, it's a new feature.

elsirion commented 1 month ago

Turns out I should have read the backlog, but I think my fix should get this unblocked.

@kernelkind I don't think the test failure is due to your code, we just didn't test leaving the federation at all earlier and apparently broke compatibility in #4275: before the PR the API returned () while after the PR it returned FederationInfo. Luckily, just decoding it as Option<T> should solve the problem.