dotmesh-io / dotmesh

dotmesh (dm) is like git for your data volumes (databases, files etc) in Docker and Kubernetes
https://dotmesh.com
Apache License 2.0
539 stars 29 forks source link

[Question] Setting up a working S3 policy #552

Open jonathanasquier opened 6 years ago

jonathanasquier commented 6 years ago

Hi, I'm getting an error on push:

➜ dm push s3 mydata
Pushing admin/mydata to s3:/mydata
Calculating...
error 0 B / 14.40 MB [--------------------------------]   0.00% ? MiB/s (0/2575)
error: couldnt-write-s3-metadata-push: open /var/lib/dotmesh/mnt/dmfs/272e1957-1c2c-4046-6568-bda6cbea86eb/dm.s3-versions/54e5e290-0cbf-42a2-7f8f-2138962740a4: no such file or directory
couldnt-write-s3-metadata-push: open /var/lib/dotmesh/mnt/dmfs/272e1957-1c2c-4046-6568-bda6cbea86eb/dm.s3-versions/54e5e290-0cbf-42a2-7f8f-2138962740a4: no such file or directory
couldnt-write-s3-metadata-push: open /var/lib/dotmesh/mnt/dmfs/272e1957-1c2c-4046-6568-bda6cbea86eb/dm.s3-versions/54e5e290-0cbf-42a2-7f8f-2138962740a4: no such file or directory

My s3 policy is:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:HeadBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:PutAnalyticsConfiguration",
                "s3:GetObjectVersionTagging",
                "s3:CreateBucket",
                "s3:ReplicateObject",
                "s3:GetObjectAcl",
                "s3:DeleteBucketWebsite",
                "s3:PutLifecycleConfiguration",
                "s3:GetObjectVersionAcl",
                "s3:PutObjectTagging",
                "s3:DeleteObject",
                "s3:GetIpConfiguration",
                "s3:DeleteObjectTagging",
                "s3:GetBucketWebsite",
                "s3:PutReplicationConfiguration",
                "s3:DeleteObjectVersionTagging",
                "s3:GetBucketNotification",
                "s3:PutBucketCORS",
                "s3:GetReplicationConfiguration",
                "s3:ListMultipartUploadParts",
                "s3:GetObject",
                "s3:PutBucketNotification",
                "s3:PutObject",
                "s3:PutBucketLogging",
                "s3:GetAnalyticsConfiguration",
                "s3:GetObjectVersionForReplication",
                "s3:GetLifecycleConfiguration",
                "s3:ListBucketByTags",
                "s3:GetBucketTagging",
                "s3:GetInventoryConfiguration",
                "s3:PutAccelerateConfiguration",
                "s3:DeleteObjectVersion",
                "s3:GetBucketLogging",
                "s3:ListBucketVersions",
                "s3:ReplicateTags",
                "s3:RestoreObject",
                "s3:GetAccelerateConfiguration",
                "s3:ListBucket",
                "s3:GetBucketPolicy",
                "s3:PutEncryptionConfiguration",
                "s3:GetEncryptionConfiguration",
                "s3:GetObjectVersionTorrent",
                "s3:AbortMultipartUpload",
                "s3:GetBucketRequestPayment",
                "s3:PutBucketTagging",
                "s3:GetObjectTagging",
                "s3:GetMetricsConfiguration",
                "s3:DeleteBucket",
                "s3:PutBucketVersioning",
                "s3:ListBucketMultipartUploads",
                "s3:PutMetricsConfiguration",
                "s3:PutObjectVersionTagging",
                "s3:GetBucketVersioning",
                "s3:GetBucketAcl",
                "s3:PutInventoryConfiguration",
                "s3:PutIpConfiguration",
                "s3:GetObjectTorrent",
                "s3:PutBucketRequestPayment",
                "s3:PutBucketWebsite",
                "s3:GetBucketCORS",
                "s3:GetBucketLocation",
                "s3:GetObjectVersion",
                "s3:ReplicateDelete"
            ],
            "Resource": [
                "arn:aws:s3:::mydata",
                "arn:aws:s3:::mydata/*"
            ]
        }
    ]
}

And enables

My s3 remote in dm seems added correctly Mydata dot is commited and has no dirty

Related questions

Thanks!

Godley commented 6 years ago

Thanks for submitting this, I'll look into it when things calm down a bit. For reference what's going on here isn't to do with your policy - there'll be something about file permissions on the dot, or I might not have created the folder properly. (Whenever there's a push, behind the scenes the server writes a file containing the version IDs it got from AWS after completing the request...so...the push likely worked at least 😄 )

Re your related questions:

  1. At the moment on creation of an s3 remote the client will check it has the ability to list buckets as kind of a cursory check to make sure it has access. We could probably just cut this, when it does a pull or push it does a head request to check the same thing on the individual bucket so that should be enough, I suppose. Could also have a look into whether there's a less expansive permission we can use to get the same info, i.e the credential has access to the S3 api.
  2. no, the name of your datadot doesn't need to match the bucket name exactly - the link is stored in the config
jonathanasquier commented 6 years ago

ok, in my case the dm.s3-versions folder does not seem to exist

Godley commented 6 years ago

Thanks for reporting this @jonathanasquier - written a few more tests and think the bug is fixed now, but we need to fix our CI in order to release it (issue #564). I'll come back and comment when that's done :)

On S3 Policy I'll see what we can do to remove ListS3Buckets.

Godley commented 6 years ago

Okkkk that should be released now - you can test it with

sudo curl -sSL -o /usr/local/bin/dm https://get.dotmesh.io/unstable/master/$(uname -s)/dm && dm cluster upgrade

I think this is the most limited S3 Policy you can use:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketVersions",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::<bucketname>",
                "arn:aws:s3:::*/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:HeadBucket",
            "Resource": "*"
        }
    ]
}
jonathanasquier commented 6 years ago

Ok so I've commited my data, branch master, no dirty.

➜ dm list
Current remote: local (use 'dm remote -v' to list and 'dm remote switch' to switch)

  DOT            BRANCH  SERVER            CONTAINERS  SIZE       COMMITS  DIRTY
* mydata  master  f7613db074846280              39.00 kiB  5  

I've remove and re-added my s3 remote with the new policy

➜ dm remote
local
s3

Then I push

➜ dm push s3 mydata
Pushing admin/mydata to s3:/mydata
Calculating...
finished 0 B / ? [----------------------------------------------=] ? MiB/s (0/0)
Done!

mydata being both the name of the bucket, and the name of the dot

It seems a commit is added to keep track of the latest reference pushed I believe

commit 2fcafc22-66ef-47c6-49b4-c62cfc797f87
author: 
date: 1536226001272633856
type: dotmesh.metadata_only

    adding s3 metadata

Trying to push again throws an error

dm push s3 mydata
Pushing admin/mydata to s3:/mydata
Calculating...
error 0 B / ? [-------------------------------------------------=] ? MiB/s (0/0)
error: Found s3 metadata for latest snap - nothing to push!
Found s3 metadata for latest snap - nothing to push!
Found s3 metadata for latest snap - nothing to push!

And pushing on an unauthorized bucket too (as expected)

yields error Head request failed - do the remote credentials have access to this bucket?

commiting more data and pushing again works as expected: "Done!" + new commit type: dotmesh.metadata_only

But... my bucket is empty :(

Is there a way I could try to understand/debug my setting/display the dotmesh.metadata_only commit content?

Also regarding the bucket policy should'nt it be


"Resource": [
     "arn:aws:s3:::<bucketname>",
     "arn:aws:s3:::<bucketname>/*"
]
``` ?

Thanks :)
Godley commented 6 years ago

try this: docker run -ti --volume-driver dm -v mydata.__root__:/data busybox cat /data/dm.s3-versions/<commit-hash-from-dm-log>

The commit hash should be the second to last commit in dm log (i.e the one that isn't a "metadata_only" commit).

What you should see in that file is a list of key-value pairs, where the key is the object name as it should have gone into S3 and the value is a version ID which dotmesh got back from the AWS api.

Godley commented 6 years ago

I've tried to reproduce this locally (see #567) and didn't get anything (i.e the file was created in s3, metadata file contained what I expected)...the steps you followed were init, add data, commit and push, right?

Godley commented 6 years ago

And yeah, thanks, that resources section could be narrowed - I'll update our docs :)

jonathanasquier commented 6 years ago

ls the container with the volume mapped gives me bin data dev etc home proc root sys tmp usr var and data contains my data (I'm testing with dummy text files)

There is no /data/dm.s3-versions which is probably related to this.

I'm testing with my 20 days old dot which is probably broken, I'll start to process all over again in a few hours.

Godley commented 6 years ago

Hmmmm that's odd, what I would have expected is /data to contain __default__ - which is where your data should be, and dm.s3-versions - have you used the name data before? i.e if you change it to mydata.__root__:/foo does that change anything?

jonathanasquier commented 6 years ago
docker run -ti --volume-driver dm -v mydata.__root__:/foo busybox
cd foo
ls #only my text files
jonathanasquier commented 6 years ago

I tried using docker run -ti --volume-driver dm -v myddata.__root__:/foo busybox directly and by doing dm init first, and I have the same behavior: I create data in /foo, and commit on master, no dirty. I push on s3 => Done! => dm.s3-versions is created in /foo when i relaunch busybox with the dot attached. the dm.s3-versions folder contains 1 file which name is the hash of my commit (not the dotmesh.metadata_only commit hash) the file contains {} the command I use to push is dm push s3 mydot --remote-name mybucket The bucket is still empty

then I moved my data to a __default__ folder I created in the folder, and commited

And... I segfaulted when I tried to push

2018/09/06 14:23:02 [updatePollResult] => /dotmesh.io/filesystems/transfers/2f18bab8-5709-4f4b-5467-5d4bb94ef96f, serialized: {"TransferRequestId":"2f1   │
│   8bab8-5709-4f4b-5467-5d4bb94ef96f","Peer":"","User":"","ApiKey":"","Direction":"push","LocalNamespace":"","LocalName":"","LocalBranchName":"","RemoteNa   │
│   mespace":"","RemoteName":"","RemoteBranchName":"","FilesystemId":"","InitiatorNodeId":"f7613db074846280","PeerNodeId":"","StartingCommit":"","TargetCom   │
│   mit":"","Index":0,"Total":1,"Status":"beginning upload","NanosecondsElapsed":0,"Size":6,"Sent":0,"Message":""}                                            │
│   panic: runtime error: invalid memory address or nil pointer dereference                                                                                   │
│   [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xc089b0]                                                                                    │
│   goroutine 461 [running]:                                                                                                                                  │
│   main.updateS3Files(0xc42191e570, 0xc42191e330, 0xc421ae6230, 0x6f, 0xc42028c3c6, 0x24, 0xc421b29b60, 0xd, 0x0, 0x0, ...)                                  │
│    cmd/dotmesh-server/s3.go:314 +0x5e0                                                                                                                      │
│   main.s3PushInitiatorState(0xc4217d7680, 0xed55a8)                                                                                                         │
│    cmd/dotmesh-server/s3pushinitiatorstate.go:81 +0xd7c                                                                                                     │
│   main.(*fsMachine).run.func1(0xc4217d7680)                                                                                                                 │
│    cmd/dotmesh-server/statemachines.go:124 +0x3b                                                                                                            │
│   created by main.(*fsMachine).run                                                                                                                          │
│    cmd/dotmesh-server/statemachines.go:122 +0x2ac

So dm cluster reset and removed the volumes and cluster init and restarted the process, segfault again That's pretty much everything I've got for now

jonathanasquier commented 6 years ago

Oh and my latest test seems to have uploaded my text file on the bucket! (along with an empty foo file, not sure why) but still managed to break the cluster

Godley commented 6 years ago

Wellll that's what I expected (the part about an empty object) - I don't know why... but it's not figured out there's anything in your data directory. The segfault I don't really understand - line 314 is just err = file.Close(). @alaric-dotmesh @lukemarsden ideas here? Maybe the path is no longer valid...

Could you run the create data -> commit -> push normally (i.e without doing the mount __root__ -> mkdir __default__ thing) and run docker logs dotmesh-server-inner and paste the result?

Godley commented 6 years ago

Hah, ok, this is very confusing.

Godley commented 6 years ago

think the commit I just did should handle the segfault so if you do another upgrade like before that might help :)

jonathanasquier commented 6 years ago

I used docker run -it --volume-driver dm -v mytestdot:/foo --rm --name testdotmesh testdotmesh

which writes to a file in the volume __default__ is created properly no error on push, but no upload (previous version uploaded but broke the cluster)

time="2018-09-07T10:47:50Z" level=info msg="[updateSnapshots] checking e809ca2c-2956-41ed-52f4-134db2f3c2d6 master:  == f7613db074846280?"
2018/09/07 10:48:14 [updateEtcdAboutSnapshots] successfully set new snaps for cc115cf7-a82f-41ea-7d00-566c35ef3a6b on f7613db074846280, will we hear an echo?
time="2018-09-07T10:48:14Z" level=info msg="[updateSnapshots] checking cc115cf7-a82f-41ea-7d00-566c35ef3a6b master: f7613db074846280 == f7613db074846280?"
time="2018-09-07T10:48:14Z" level=info msg="[updateSnapshots] publishing latest snapshot {e6c2b8e5-f55e-457d-6ee2-a27729895c4f 0xc42020c820 <nil>} on cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
time="2018-09-07T10:48:31Z" level=info msg="[S3Transfer] starting with  KeyID=AKIAJXBFZB2JOH5FRW6A, SecretKey=****, Prefixes=[], Endpoint=, Direction=push, LocalNamespace=admin, LocalName=mytestdot, LocalBranchName=, RemoteName=truskplopdata,"
time="2018-09-07T10:48:31Z" level=info msg="globalFsRequest cc115cf7-a82f-41ea-7d00-566c35ef3a6b <Event s3-transfer: Transfer: \" KeyID=AKIAJXBFZB2JOH5FRW6A, SecretKey=****, Prefixes=[], Endpoint=, Direction=push, LocalNamespace=admin, LocalName=mytestdot, LocalBranchName=, RemoteName=truskplopdata,\">"
time="2018-09-07T10:48:31Z" level=info msg="globalFsRequest: setting '/dotmesh.io/filesystems/requests/cc115cf7-a82f-41ea-7d00-566c35ef3a6b/892522b0-a201-4d78-7c91-b4f0e715809f' to '{\"Name\":\"s3-transfer\",\"Args\":{\"Transfer\":{\"KeyID\":\"AKIAJXBFZB2JOH5FRW6A\",\"SecretKey\":\"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\",\"Prefixes\":null,\"Endpoint\":\"\",\"Direction\":\"push\",\"LocalNamespace\":\"admin\",\"LocalName\":\"mytestdot\",\"LocalBranchName\":\"\",\"RemoteName\":\"truskplopdata\"}}}'"
time="2018-09-07T10:48:31Z" level=info msg="About to dispatch <Event s3-transfer: Transfer: map[\"Prefixes\":<nil> \"Direction\":\"push\" \"LocalName\":\"mytestdot\" \"LocalBranchName\":\"\" \"RemoteName\":\"truskplopdata\" \"KeyID\":\"AKIAJXBFZB2JOH5FRW6A\" \"SecretKey\":\"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\" \"Endpoint\":\"\" \"LocalNamespace\":\"admin\"]> to cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
time="2018-09-07T10:48:31Z" level=info msg="[initFilesystemMachine] starting: cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
time="2018-09-07T10:48:31Z" level=info msg="[initFilesystemMachine] acquired lock: cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
time="2018-09-07T10:48:31Z" level=info msg="[initFilesystemMachine] reusing fsMachine for cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
time="2018-09-07T10:48:31Z" level=info msg="Got response chan 0xc421a0cf00, %!s(<nil>) for cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
2018/09/07 10:48:31 [run:cc115cf7-a82f-41ea-7d00-566c35ef3a6b] got req: <Event s3-transfer: Transfer: map["LocalBranchName":"" "Prefixes":<nil> "Direction":"push" "LocalName":"mytestdot" "LocalNamespace":"admin" "RemoteName":"truskplopdata" "KeyID":"AKIAJXBFZB2JOH5FRW6A" "SecretKey":"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" "Endpoint":""], RequestId: "892522b0-a201-4d78-7c91-b4f0e715809f">
2018/09/07 10:48:31 [run:cc115cf7-a82f-41ea-7d00-566c35ef3a6b] writing to internal requests
2018/09/07 10:48:31 [run:cc115cf7-a82f-41ea-7d00-566c35ef3a6b] reading from internal responses
2018/09/07 10:48:31 GOT S3 TRANSFER REQUEST  KeyID=AKIAJXBFZB2JOH5FRW6A, SecretKey=****, Prefixes=[], Endpoint=, Direction=push, LocalNamespace=admin, LocalName=mytestdot, LocalBranchName=, RemoteName=truskplopdata,
2018/09/07 10:48:31 <transition> cc115cf7-a82f-41ea-7d00-566c35ef3a6b to s3PushInitiatorState requesting (from active waiting, 77.27s ago)
2018/09/07 10:48:31 [updatePollResult] attempting to update poll result for 892522b0-a201-4d78-7c91-b4f0e715809f: {TransferRequestId:892522b0-a201-4d78-7c91-b4f0e715809f Peer: User: ApiKey: Direction:push LocalNamespace: LocalName: LocalBranchName: RemoteNamespace: RemoteName: RemoteBranchName: FilesystemId: InitiatorNodeId:f7613db074846280 PeerNodeId: StartingCommit: TargetCommit: Index:0 Total:0 Status:starting NanosecondsElapsed:0 Size:0 Sent:0 Message:}
2018/09/07 10:48:31 [updatePollResult] => /dotmesh.io/filesystems/transfers/892522b0-a201-4d78-7c91-b4f0e715809f, serialized: {"TransferRequestId":"892522b0-a201-4d78-7c91-b4f0e715809f","Peer":"","User":"","ApiKey":"","Direction":"push","LocalNamespace":"","LocalName":"","LocalBranchName":"","RemoteNamespace":"","RemoteName":"","RemoteBranchName":"","FilesystemId":"","InitiatorNodeId":"f7613db074846280","PeerNodeId":"","StartingCommit":"","TargetCommit":"","Index":0,"Total":0,"Status":"starting","NanosecondsElapsed":0,"Size":0,"Sent":0,"Message":""}
2018/09/07 10:48:31 [s3PushInitiatorState] path to s3 metadata: /var/lib/dotmesh/mnt/dmfs/cc115cf7-a82f-41ea-7d00-566c35ef3a6b@e6c2b8e5-f55e-457d-6ee2-a27729895c4f/dm.s3-versions/e6c2b8e5-f55e-457d-6ee2-a27729895c4f
2018/09/07 10:48:31 [updatePollResult] attempting to update poll result for 892522b0-a201-4d78-7c91-b4f0e715809f: {TransferRequestId:892522b0-a201-4d78-7c91-b4f0e715809f Peer: User: ApiKey: Direction:push LocalNamespace: LocalName: LocalBranchName: RemoteNamespace: RemoteName: RemoteBranchName: FilesystemId: InitiatorNodeId:f7613db074846280 PeerNodeId: StartingCommit: TargetCommit: Index:0 Total:0 Status:beginning upload NanosecondsElapsed:0 Size:0 Sent:0 Message:}
2018/09/07 10:48:31 [updatePollResult] => /dotmesh.io/filesystems/transfers/892522b0-a201-4d78-7c91-b4f0e715809f, serialized: {"TransferRequestId":"892522b0-a201-4d78-7c91-b4f0e715809f","Peer":"","User":"","ApiKey":"","Direction":"push","LocalNamespace":"","LocalName":"","LocalBranchName":"","RemoteNamespace":"","RemoteName":"","RemoteBranchName":"","FilesystemId":"","InitiatorNodeId":"f7613db074846280","PeerNodeId":"","StartingCommit":"","TargetCommit":"","Index":0,"Total":0,"Status":"beginning upload","NanosecondsElapsed":0,"Size":0,"Sent":0,"Message":""}
2018/09/07 10:48:31 [updateS3Files] files: map[string]os.FileInfo(nil)
2018/09/07 10:48:32 [snapshot] Attempting: zfs [snapshot -o io.dotmesh:meta-message=YWRkaW5nIHMzIG1ldGFkYXRh -o io.dotmesh:meta-type=ZG90bWVzaC5tZXRhZGF0YV9vbmx5 -o io.dotmesh:meta-timestamp=MTUzNjMxNzMxMjAzMDAwNzU1NQ== pool/dmfs/cc115cf7-a82f-41ea-7d00-566c35ef3a6b@ad349dd3-08fc-45a0-6eee-66ea840c519d]
2018/09/07 10:48:32 [snapshot] listed snapshot: '"\"NAME                                                                                  USED  AVAIL  REFER  MOUNTPOINT\\npool/dmfs/cc115cf7-a82f-41ea-7d00-566c35ef3a6b@ad349dd3-08fc-45a0-6eee-66ea840c519d      0      -  19.5K  -\\n\""'
2018/09/07 10:48:32 [snapshot] Succeeded snapshotting (out: ''), saving: &{Id:ad349dd3-08fc-45a0-6eee-66ea840c519d Metadata:0xc420096560 filesystem:<nil>}
2018/09/07 10:48:32 [updatePollResult] attempting to update poll result for 892522b0-a201-4d78-7c91-b4f0e715809f: {TransferRequestId:892522b0-a201-4d78-7c91-b4f0e715809f Peer: User: ApiKey: Direction:push LocalNamespace: LocalName: LocalBranchName: RemoteNamespace: RemoteName: RemoteBranchName: FilesystemId: InitiatorNodeId:f7613db074846280 PeerNodeId: StartingCommit: TargetCommit: Index:0 Total:0 Status:finished NanosecondsElapsed:0 Size:0 Sent:0 Message:}
2018/09/07 10:48:32 [updatePollResult] => /dotmesh.io/filesystems/transfers/892522b0-a201-4d78-7c91-b4f0e715809f, serialized: {"TransferRequestId":"892522b0-a201-4d78-7c91-b4f0e715809f","Peer":"","User":"","ApiKey":"","Direction":"push","LocalNamespace":"","LocalName":"","LocalBranchName":"","RemoteNamespace":"","RemoteName":"","RemoteBranchName":"","FilesystemId":"","InitiatorNodeId":"f7613db074846280","PeerNodeId":"","StartingCommit":"","TargetCommit":"","Index":0,"Total":0,"Status":"finished","NanosecondsElapsed":0,"Size":0,"Sent":0,"Message":""}
2018/09/07 10:48:32 [updateEtcdAboutSnapshots] going 'round the loop
2018/09/07 10:48:32 <transition> cc115cf7-a82f-41ea-7d00-566c35ef3a6b to discovering loading (from s3PushInitiatorState requesting, 0.37s ago)
2018/09/07 10:48:32 [run:cc115cf7-a82f-41ea-7d00-566c35ef3a6b] got resp: <Event s3-pushed: <nil>>
2018/09/07 10:48:32 [run:cc115cf7-a82f-41ea-7d00-566c35ef3a6b] writing to external responses
2018/09/07 10:48:32 [run:cc115cf7-a82f-41ea-7d00-566c35ef3a6b] reading from external requests
time="2018-09-07T10:48:32Z" level=info msg="Done putting it into internalResponse (cc115cf7-a82f-41ea-7d00-566c35ef3a6b, 0xc421a0cf00)"
2018/09/07 10:48:32 [updateEtcdAboutSnapshots] successfully set new snaps for cc115cf7-a82f-41ea-7d00-566c35ef3a6b on f7613db074846280, will we hear an echo?
2018/09/07 10:48:32 entering discovering state for cc115cf7-a82f-41ea-7d00-566c35ef3a6b
time="2018-09-07T10:48:32Z" level=info msg="finished transfer of  KeyID=AKIAJXBFZB2JOH5FRW6A, SecretKey=****, Prefixes=[], Endpoint=, Direction=push, LocalNamespace=admin, LocalName=mytestdot, LocalBranchName=, RemoteName=truskplopdata,, <Event s3-pushed: <nil>>"
time="2018-09-07T10:48:32Z" level=info msg="[updateSnapshots] checking cc115cf7-a82f-41ea-7d00-566c35ef3a6b master: f7613db074846280 == f7613db074846280?"
time="2018-09-07T10:48:32Z" level=info msg="[updateSnapshots] publishing latest snapshot {ad349dd3-08fc-45a0-6eee-66ea840c519d 0xc420432078 <nil>} on cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
2018/09/07 10:48:32 [updateEtcdAboutSnapshots] going 'round the loop
2018/09/07 10:48:32 <transition> cc115cf7-a82f-41ea-7d00-566c35ef3a6b to active waiting (from discovering loading, 0.05s ago)
2018/09/07 10:48:32 [updateEtcdAboutSnapshots] successfully set new snaps for cc115cf7-a82f-41ea-7d00-566c35ef3a6b on f7613db074846280, will we hear an echo?
2018/09/07 10:48:32 entering active state for cc115cf7-a82f-41ea-7d00-566c35ef3a6b
time="2018-09-07T10:48:32Z" level=info msg="[updateSnapshots] checking cc115cf7-a82f-41ea-7d00-566c35ef3a6b master: f7613db074846280 == f7613db074846280?"
time="2018-09-07T10:48:32Z" level=info msg="[updateSnapshots] publishing latest snapshot {ad349dd3-08fc-45a0-6eee-66ea840c519d 0xc42000c048 <nil>} on cc115cf7-a82f-41ea-7d00-566c35ef3a6b"
time="2018-09-07T10:48:32Z" level=info msg="[alignMountStateWithMasters] called for cc115cf7-a82f-41ea-7d00-566c35ef3a6b; masterFor=f7613db074846280, myNodeId=f7613db074846280; mounted=true"
time="2018-09-07T10:48:32Z" level=info msg="[updateSnapshots] checking cc115cf7-a82f-41ea-7d00-566c35ef3a6b master: f7613db074846280 == f7613db074846280?"
time="2018-09-07T10:48:32Z" level=info msg="[updateSnapshots] publishing latest snapshot {ad349dd3-08fc-45a0-6eee-66ea840c519d 0xc42000c080 <nil>} on cc115cf7-a82f-41ea-7d00-566c35ef3a6b"

(plane taking off)

jonathanasquier commented 6 years ago

Aaaaand I commented my secret key in the logs.

Godley commented 6 years ago

Hummmm so potentially, there's an error I missed here which might explain why the push is failing (can't figure out what to upload to s3 if it fails to scan the directory in the first place). I've also stuck more logging in to see what it gives us after the fact (in case that isn't the problem).

Could you please update your cluster again? :)

jonathanasquier commented 6 years ago

I did

# delete all dots
# pull unstable and reinit cluster
dm init mytestdot
dm commit -m "1"
docker run -it --volume-driver dm -v mytestdot:/foo --rm --name testdotmesh testdotmesh # creates a file in the volume
dm commit -m "2"
dm push s3 mytestdot --remote-name mybucket

Uploading works on s3, but dm push loops on

Got error, trying again: Unable to connect to any of the addresses attempted: [{Scheme:http Hostname:127.0.0.1 Port:32607} {Scheme:http Hostname:127.0.0.1 Port:6969}], errs: [Post http://127.0.0.1:32607/rpc: dial tcp 127.0.0.1:32607: connect: connection refused Post http://127.0.0.1:6969/rpc: dial tcp 127.0.0.1:6969: connect: connection refused]

and getting in dotmesh-server-inner crashing it

2018/09/14 09:32:58 [attemptReceive:0a2fb51f-3c84-47d0-7dff-fe91e293188d] Error No known filesystem with id 0a2fb51f-3c84-47d0-7dff-fe91e293188d, not attempting to receive
2018/09/14 09:32:58 <transition> 0a2fb51f-3c84-47d0-7dff-fe91e293188d to inactive waiting for requests or snapshots (from inactive waiting for requests, 0.01s ago)
2018/09/14 09:32:58 [s3PushInitiatorState] path to s3 metadata: /var/lib/dotmesh/mnt/dmfs/cc816e92-d8e9-4f52-6441-3be274fb46a3@1a93f059-45fa-48e8-59f6-2e640803cff0/dm.s3-versions/1a93f059-45fa-48e8-59f6-2e640803cff0
2018/09/14 09:32:59 [getKeysForDir] Files: []os.FileInfo{(*os.fileStat)(0xc4203ca270)}
2018/09/14 09:32:59 [updatePollResult] attempting to update poll result for f839def2-41e7-4975-6a38-c3f7de66b458: {TransferRequestId:f839def2-41e7-4975-6a38-c3f7de66b458 Peer: User: ApiKey: Direction:push LocalNamespace: LocalName: LocalBranchName: RemoteNamespace: RemoteName: RemoteBranchName: FilesystemId: InitiatorNodeId:f7613db074846280 PeerNodeId: StartingCommit: TargetCommit: Index:0 Total:1 Status:beginning upload NanosecondsElapsed:0 Size:12 Sent:0 Message:}
2018/09/14 09:32:59 [updatePollResult] => /dotmesh.io/filesystems/transfers/f839def2-41e7-4975-6a38-c3f7de66b458, serialized: {"TransferRequestId":"f839def2-41e7-4975-6a38-c3f7de66b458","Peer":"","User":"","ApiKey":"","Direction":"push","LocalNamespace":"","LocalName":"","LocalBranchName":"","RemoteNamespace":"","RemoteName":"","RemoteBranchName":"","FilesystemId":"","InitiatorNodeId":"f7613db074846280","PeerNodeId":"","StartingCommit":"","TargetCommit":"","Index":0,"Total":1,"Status":"beginning upload","NanosecondsElapsed":0,"Size":12,"Sent":0,"Message":""}
2018/09/14 09:32:59 [updateS3Files] files: map[string]os.FileInfo{"plop.txt":(*os.fileStat)(0xc4203ca270)}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xc0928a]

goroutine 588 [running]:
main.uploadFileToS3(0xc420088c80, 0x78, 0xc42024c270, 0x8, 0xc421aaa1d0, 0xd, 0xc4216230e0, 0x0, 0x0, 0x0, ...)
    cmd/dotmesh-server/s3.go:329 +0x22a
main.updateS3Files(0xc42052eea0, 0xc421648420, 0xc42025a070, 0x6f, 0xc420448196, 0x24, 0xc421aaa1d0, 0xd, 0x0, 0x0, ...)
    cmd/dotmesh-server/s3.go:302 +0x4c9
main.s3PushInitiatorState(0xc42001eb40, 0xed6720)
    cmd/dotmesh-server/s3pushinitiatorstate.go:85 +0xd8f
main.(*fsMachine).run.func1(0xc42001eb40)
    cmd/dotmesh-server/statemachines.go:124 +0x3b
created by main.(*fsMachine).run
    cmd/dotmesh-server/statemachines.go:122 +0x2ac
jonathanasquier commented 6 years ago

Hello! Do you have any update on this issue please :) ? I really (really really really) want to put dotmesh in my team's workflow! thanks!

Godley commented 6 years ago

Hey! My apologies, it's been a bit of a busy couple of weeks. I'll see if we can bump this up a sprint.

I'm a little confused how that error is happening to be honest, as the line is return *output.VersionID, nil - that's the returned info from AWS S3. It seems odd to me that that dereference would fail because that to me would be an error if that's a nil pointer, but yet the error returned from AWS S3 is nil...We'll need to do some research into how/why that's happening