Check if storage is working when returning the response to a faucet healthcheck request.

philippecamacho commented 2 years ago

Closes #1186.

I am not sure if the storage requests are neutral. Also I have an error when I run the tests locally. Something similar to the error described in #1179.

CLAassistant commented 2 years ago

All committers have signed the CLA.

sveitser commented 2 years ago

I think this should do the job.

If only the remove steps fails (unlikely) then subsequent inserts would not actually write to disk anymore because if the key is already in the index it doesn't insert it again until it has received the grants and is removed. This could be avoided by using a random key instead.

    /// Add an element to the persistent index.
    ///
    /// Returns `true` if the element was inserted or `false` if it was already in the index.
    fn insert(&mut self, key: UserPubKey) -> Result<bool, FaucetError> {
        if self.index.contains_key(&key) {
            // If the key is already in the index, we don't have to persist anything.
            return Ok(false);
        }
...

Another concern would be the faucet becoming unresponsive if the /healthcheck endpoint gets hit often because of the lock. This could be avoided by checking if we can write to disk in some other way (for example by creating a random file). The good thing about the current implementation is however that it exercises a lot more of the machinery we use in the faucet.

Overall I think it may reduce problems or alert us earlier so I'd be happy to try like this.

I'd be curious to get @jbearer's opinion though.

jbearer commented 2 years ago

I don't think lock contention is a big problem. The time this healthcheck endpoint spends with the lock is nothing compared to how long it takes to build a transaction. Plus the healthcheck is only called every few seconds, I believe.

If only the remove step fails, then we will actually end up sending a grant to the default address, which wastes a small amount of funds and time, but it's not too bad given that this should be extremely rare. And when we send the grant to the default address, we will remove it from the queue, so subsequent healthchecks will start doing the correct thing again.

Only suggestion is I think we should log something at ERROR level if either of these writes fails, so that if the healthcheck fails we will know why, and especially if only one of the writes fails, we will know it happened.

philippecamacho commented 2 years ago

Only suggestion is I think we should log something at ERROR level if either of these writes fails, so that if the healthcheck fails we will know why, and especially if only one of the writes fails, we will know it happened.

Right, done in 5b62743c00d27e0cf92e0bafafebb9ab58ab6a4d.

EspressoSystems / cape

Check if storage is working when returning the response to a faucet healthcheck request. #1189