balena-io / open-balena-s3

Amazon S3-compatible storage backend for openBalena
balena.io/open
GNU Affero General Public License v3.0
14 stars 11 forks source link

Legacy FS mode not used by default, resulting in errors #184

Open dcaputo-harmoni opened 1 year ago

dcaputo-harmoni commented 1 year ago

In standing up a new instance of open-balena-s3, I noticed a number of errors when committing images to the registry. After a lot of troubleshooting, I found out that the minio server had defaulted to "xl-single" format, which was causing the errors. When reverting it to the legacy "fs" format, the errors are gone. Suggest adding the code below to the top of the "scripts/create-buckets.sh" file to address the issue:

# ensure that minio is running in legacy-fs mode
FORMAT_FILE="/export/.minio.sys/format.json"

if [[ "$(cat $FORMAT_FILE | jq -r .format)" != "fs" ]]; then
  systemctl stop open-balena-s3.service
  ID="$(cat $FORMAT_FILE | jq -r .id)"
  echo '{"version":"1","format":"fs","id":"'$ID'","fs":{"version":"2"}}' > $FORMAT_FILE
  systemctl start open-balena-s3.service
fi
shaunco commented 11 months ago

Any chance this manifesting itself on a balena deploy with something like this in the CLI log?

[Debug]   Authorizing push...
[Info]    Pushing images to registry...
Retrying "registry.myopenbalena.com/v2/7b51621a557ebf65b36e5689f42b3974:latest" after 2.0s (1 of 2) due to: Error: unauthorized: authentication required
Retrying "registry.myopenbalena.com/v2/7b51621a557ebf65b36e5689f42b3974:latest" after 2.8s (2 of 2) due to: Error: unauthorized: authentication required
[Debug]   Saving image registry.myopenbalena.com/v2/7b51621a557ebf65b36e5689f42b3974
[Debug]   Untagging images...
[Info]    Saving release...
[Error]   Deploy failed
unauthorized: authentication required
dcaputo-harmoni commented 11 months ago

@shaunco No, never saw that message. It was causing all kinds of low level file system issues in the container during docker pushes along the lines of 'Resource requested is unreadable, please reduce your request rate'. It looks like you guys were tuned into the issue with erasure coding because the minio version you are using is pegged at the last one that still supports the legacy fs mode. But even though it supports it, it only does so when the minio instance is pre-existing with a legacy fs mode specified. It is creating new instances using the breaking erasure coding.

shaunco commented 11 months ago

@dcaputo-harmoni - I'm not part of balena ... just another open-balena user contributing where I can. Although, it seems like ~80% of the things I run into, you ran into a few weeks before me and posted issues/PRs for them 😅

Thanks for the extra info! I put your fix in my s3 service's statefulset yaml as:

          lifecycle:
            postStart:
              exec:
                command:
                  - /bin/sh
                  - -c
                  - |
                    echo '#!/bin/bash
                    FORMAT_FILE="/export/.minio.sys/format.json"
                    LEGACY_JSON='\''{"version":"1","format":"fs","id":"%s","fs":{"version":"2"}}'\''

                    if [[ "$(cat $FORMAT_FILE | jq -r .format)" != "fs" ]]; then
                      systemctl stop open-balena-s3.service
                      ID="$(cat $FORMAT_FILE | jq -r .id)"
                      printf "$LEGACY_JSON" "$ID" > $FORMAT_FILE
                      systemctl start open-balena-s3.service
                    fi' > /sbin/fix-minio.sh
                    chmod +x /sbin/fix-minio.sh
                    /sbin/fix-minio.sh
shaunco commented 11 months ago

~I'm digging into this more as I'm fighting a battle with balena deploy getting back Error: blob upload unknown after it successfully pushes to registry/s3, and I don't see balena-registry or api directly accessing the s3 filesystem anywhere~ ... everything goes through the S3 API provided by Minio (or AWS if you're not using open-balena-s3). Based on the Minio SNSD docs, the zero-partiy erasure coded backend should make no difference for anyone using the S3 API:

This mode requires accessing stored objects through the S3 API, and does not support direct access to objects through the filesystem/POSIX interface.

Am I missing something here where there is direct filesystem/POSIX access by balena?

EDIT: My blob unknown issue was Cloudflare detecting a managed signature in part of the docker image being uploaded 🙃

taai commented 7 months ago

Could the problem be related to this? https://github.com/minio/minio/issues/16314#issuecomment-2037899059

There is a limitation in MinIO - while there is someone downloading the object, the object is locked and cannot be overridden/deleted. Is this what you are trying to do - override an object and you are not able to because of the MinIO's limitation...? I'm just curious. Found this conversation while searching information about the issue with MinIO...

shaunco commented 7 months ago

@taai - no, this minio based "s3" store is used as the backing store for Docker Distribution. It is a one-time write to the blobs when a docker image is pushed and then a lot of reads after the push succeeds ... no deletes.

@klutchell - any details on why this needs to use Minio's legacy FS mode? I just had a production deployment of this decide to corrupt the /export/.minio.sys data after 5 months of everything working fine (no reboots/service restarts/etc ... just seemingly out of the blue). Recreating /export/.minio.sys with the legacy FS format.json got it working again for about 30 minutes, and then Minio converted all the metadata to the new FS again, which made Minio think there were no blobs.

Really not sure what to do here when it keeps trying (and failing) to convert itself with no details of how/why in the log.