gilbertchen / duplicacy

A new generation cloud backup tool
https://duplicacy.com
Other
5.12k stars 334 forks source link

Fully document build environment used to produce releases. #206

Open macdanny opened 7 years ago

macdanny commented 7 years ago

There is a significant difference in "duplicacy copy" performance between the releases on GitHub and builds I produce myself.

Using the GitHub release, my "duplicacy copy" command averages around 7.25 MB/s.

If I produce a build with "go get -u github.com/gilbertchen/duplicacy/duplicacy" and use the resulting binary, my "duplicacy copy" command averages only around 4 MB/s.

This is an issue because "duplicacy copy" frequently crashes and I have to start over. Code was introduced after 2.0.9 that improves the performance when starting "duplicacy copy" over again. I have to choose either good download performance or a reasonable chance of completion. I would like to have both.

I don't believe this is an issue with the duplicacy code but rather with one of the libraries it uses. On my machine I checked out v2.0.9 and built it and the performance is still poor.

go version go1.8.3 linux/amd64

These are my revisions.

./aryann/difflib
commit e206f873d14a916d3d26c40ab667bca123f365a3
./aws/aws-sdk-go
commit ac95a675bef4affca88be4cd5c11855a59015fce
./Azure/go-autorest
commit f6be1abbb5abd0517522f850dd785990d373da7e
./bkaradzic/go-lz4
commit 7224d8d8f27ef618c0a95f1ae69dbb0488abc33a
./dgrijalva/jwt-go
commit a539ee1a749a2b895533f979515ac7e6e0f5b650
./gilbertchen/azure-sdk-for-go
commit 0b09de4174ca0cadcd3abb4aa31267c6f5ccebbb
./gilbertchen/cli
commit 1de0a1836ce9c3ae1bf737a0869c4f04f28a7f98
./gilbertchen/duplicacy
commit 978212fd756a8946e203b34168ec027793260ddb
./gilbertchen/goamz
commit eada9f4e8cc2a45db775dee08a2c37597ce4760a
./gilbertchen/go.dbus
commit 9e442e6378618c083fd3b85b703ffd202721fb17
./gilbertchen/go-dropbox
commit 90711b603312b1f973f3a5da3793ac4f1e5c2f2a
./gilbertchen/gopass
commit bf9dde6d0d2c004a008c27aaee91170c786f6db8
./gilbertchen/keyring
commit 1260032d8c5455033c8aa592564d8ea0a6874935
./gilbertchen/xattr
commit 68e7a6806b0137a396d7d05601d7403ae1abac58
./golang/protobuf
commit 130e6b02ab059e7b717a096f397c5b60111cae74
./googleapis/gax-go
commit 317e0006254c44a0ac427cc52a0e083ff0b9622f
./kr/fs
commit 2788f0dbd16903de03cb8186e5c7d97b69ad387b
./minio/blake2b-simd
commit 3f5f724cb5b182a5c278d6d3d55b40e7f8c2efb4
./pkg/errors
commit 2b3a18b5f0fb6b4f9190549597d3f962c02bc5eb
./pkg/sftp
commit 98203f5a8333288eb3163b7c667d4260fe1333e9
./satori/uuid
commit 5bf94b69c6b68ee1b541973bb8e1144db23a194b
./vaughan0/go-ini
commit a98ad7ee00ec53921f08832bc06ecf7fd600e6a1
leftytennis commented 6 years ago

To build from github, you should use "go get -u github.com/gilbertchen/duplicacy/...". I read that somewhere in either the doc or an issue, can't remember which... :)

If you clone the repository locally and want to build from your local cloned repository, the way I've accomplished that is to update duplicacy_main.go and replace the import statement that reads:

"github.com/gilbertchen/duplicacy/src"

with

"../src"

Perhaps there's a better way to accomplish this, I'm new to go, but I found that if I made changes to any of the src files under src/*, then those changes were not picked up unless I changed the import statement as shown above.

macdanny commented 6 years ago

To build from github, you should use "go get -u github.com/gilbertchen/duplicacy/..."

It's what I did. I ran:

rm -rf $GOPATH/src/github.com/*
go get -u github.com/gilbertchen/duplicacy/duplicacy

You don't need to change the import. "go get" is a one-stop shop that does everything but it's not the only way you have to build the code. You can do this instead if you want to build from local sources.

cd $GOPATH/src/github.com/gilbertchen/duplicacy/duplicacy
go install

I believe you'll get the same results if you drop the -u from "go get" but I am not sure (I haven't used go much either).

The difficulty with doing that is that go doesn't seem to have any reasonable release engineering mechanisms. "go get" always gets the latest revision. So builds are time dependent. They aren't repeatable. Different developers can check out the same duplicacy code and run the same command to build it but they can have different results because the libraries used by duplicacy might have changed on gitHub between when Alice built it and when Bob did.

go introduced vendoring to avoid this. Maybe duplicacy should do that. It's hard to collaborate properly when you don't have repeatable builds.

leftytennis commented 6 years ago

Next time I have local changes to try, I'll try the steps you described above, as opposed to changing the impact as I have been doing, thanks for that.

leftytennis commented 6 years ago

Also, I'm not sure I understand your comment regarding repeatable builds. If you checkout or glone the git repository, then building from that repository should indeed be repeatable, correct?

The "go get" updates, if using the local duplicacy/duplicacy_main.go and src/* files should indeed be repeatable.

macdanny commented 6 years ago

If Alice runs a build, then runs the same commands later, she is repeating the same build. If Bob later repeats the commands Alice used on his own computer, he probably is not repeating the same build, because he probably has different library versions than Alice used. Neither Alice, nor Bob, nor anything in the project, explicitly identifies specific versions of library code used by the build. It just gets the latest.

leftytennis commented 6 years ago

So how to i build from the source files that I have in my local git repository? My hack was to change the duplicacy include from github.com/gilbertchen/duplicacy to ../src. Is there a flavor of "go get" I can use that would use the git src/* files to populate $GOPATH/go/pkgs

gilbertchen commented 6 years ago

I run cd ~/go/src/github.com/gilbertchen/duplicacy && build duplicacy/duplicacy_main.go to build the binary. I forked most dependent libraries under github.com/gilbertchen. The only significant exception is aws/aws-sdk-go, which may be the source of discrepancies that you observed.

Go vendoring is the right way and we should do it.

leftytennis commented 6 years ago

Thanks for that. Not sure what vendoring is about but I'll read up on it.

tbain98 commented 6 years ago

I Googled vendoring to see what it is, and someone mentioned Glide as a good way to manage dependencies. A quick look through the project page makes it sound like it's a tool to simplify the management of the vendor folder that allows you to lock versions, which sounds like it would be a very good thing. So we should evaluate it as we decide what to do about moving to vendoring.

jbrodriguez commented 6 years ago

There are many vendoring solutions for go, but dep should become the official one in the not so distant future.

There are many other things going on behind dep, such as 'dropping' GOPATH in favor of a folder project approach, but that is even further ahead.

leftytennis commented 6 years ago

I had started playing with govendor, which seemed pretty popular too vs godep. Having never used go before, the programming language itself is pretty cool and easy to pickup, but there are other aspects of go that have me scratching my head... :)

leftytennis commented 6 years ago

OK, I installed godep and played around with it. In general, I think it does what we want/need, with one possible exception. After running "godep save", I see that it vendors all of the packages, including github.com/gilbertchen/duplicacy/src.

I don't see a method of excluding a package from godep vendoring, but govendor does provide this functionality.

Unless I'm completely missing the point, I want to be able to fork the git repository, make local changes, produce a build using my local changes, and eventually submit a pull request based on the changes I have committed to my local fork.

Since duplicacy_main.go has the following import statement:

github.com/gilbertchen/duplicacy/src

godep will vendor this package and it's used for builds, which does not pickup local changes I've made in my forked repository. Seems to me this package needs to be excluded from vendoring.

While vendoring may indeed be the answer to how to produce repeatable builds, I'm not understanding how to properly use this in the context of a forked repository.

Anyone care to enlighten me? Thanks.

lowne commented 6 years ago

@jt70471 Apparently the sanctioned way (according to several posts around the internet) is to clone your forked repo into the original repo's namespace, i.e. ~/go/src/github.com/gilbertchen/duplicacy - which is what I did (just renamed my original ~/go/src/github.com/lowne/duplicacy to gilbertchen/duplicacy) and can confirm works.

As for vendoring, I used the official dep, but had to manually edit the manifest to always point to master as it defaults to the latest tag (and then enforces semver) which in some cases is too old and causes breakage. In short, no point in bothering with it (vs go get) until @gilbertchen supplies (and commits) a proper manifest (and ideally tags his forks appropriately).

leftytennis commented 6 years ago

@lowne, thanks for the reply. Unless I’m mistaken, that procedure doesn’t address the issue where I now have a cloned local repository and if we use a tool that does the vendoring, the github.com/gilbertchen/duplicacy/src package will still be vendor’d by godep.

So if I updated code in $HOME/go/src/github.com/jt70471/duplicacy/src that it won’t be used to build since go will first look for the package in the vendor subdirectory. If I change src in the vendor subdirectory, then the issue there is it’s not part of the git repository. Unless of course, we actually made the vendor subdirectory part of the git repository, which I don’t think makes sense.

How are you getting around this? Thanks.

lowne commented 6 years ago

So if I updated code in $HOME/go/src/github.com/jt70471/duplicacy/src

No, the local clone of your forked repo (i.e. your modified code in your machine) must be the in the original repo's namespace in $GOPATH. To be more clear: cd ~/go/src/github.com ; mv jt70471 gilbertchen - your updated code will then be in $HOME/go/src/github.com/gilbertchen/duplicacy/src.

As for the upstream code that ends up in .../vendor/.../gilbertchen/duplicacy because of godep, the solutions are:

  1. don't bother with vendoring for now (as I explained above) OR
  2. tell godep to ignore gilbertchen/duplicacy (idk the details about how) OR
  3. just rm -rf vendor/github.com/gilbertchen/duplicacy before go build; it'll then search $GOPATH, which has your forked code (which, again, must be in $HOME/go/src/github.com/gilbertchen/duplicacy, and NOT .../jt70471/duplicacy)

EDIT: I find this oddity of forks under the upstream repo's namespace ... not optimal (to be polite), but apparently that's how it's done - see here, here or here.

leftytennis commented 6 years ago

Thanks for the reply... I did not see a way to tell godep to exclude a package, but I do see that govendor provides this capability... I don't know what's "normal" for packaging code using go, but if duplicacy_main.go could include all of the ../src/* files w/o having github.com/gilbertchen/duplicacy/src be a separate package named duplicacy, it would prevent that package from being vendor'd.

rawtaz commented 6 years ago

So much trouble to just build a local copy of the code. Personally I would like to know how to build this (including deps, of course) without using Go's get stuff - just cloning the repo and then compiling.

Restic has its own build.go which you go run after checking out whichever commit you want. It's really nice and useful, anyone can build any version easily. Looks like it handles dependencies differently though, I guess that's what's causing the problem here.

If anyone has instructions on how to build Duplicacy manually, without go get and without installing (i.e. just producing a binary that you can then execute), I'd be happy :)

leftytennis commented 6 years ago

You can't build it ASIS from a local cloned repository without changing the import in duplicacy_main.go from "github.com/gilbertchen/duplicacy/src" to "../src".

At least that's my assessment and how I've made/tested local changes... You just have to remember to undo that edit before you checkin your changes if you're planning on submitting a pull request.

The vendor stuff being discussed in this thread solves one problem, which is repeatable builds, but from what I can tell, it doesn't change the fact that duplicacy_main.go is dependent upon a package called duplicacy. When using go get, it will apparently always update with the latest github.com/gilbertchen/duplicacy/src files. If you use vendor support in go, the duplicacy package will get vendor'd, which won't use the local git cloned ../src/* stuff either. :)

gilbertchen commented 6 years ago

If you check out all dependent libraries like those under 'github.com/gilbertchen' it should work, right?

Of course this is still bad. I'll do something about it next week.

gilbertchen commented 6 years ago

Ok, I see. If you fork 'github.com/gilbertchen/duplicacy' and then clone your own fork locally you won't be able to build unless you change that import to '../src'.

Maybe that import should be '../src'.

leftytennis commented 6 years ago

I knew nothing about go when I found duplicacy, but when I tried the normal 1) glone git repository 2) make changes to the files in the local repository and 3) build, I found that my local changes were not being picked up because duplicacy_main.go imports another package named duplicacy, which lives at github.com/gilbertchen/duplicacy/src. From what I can tell, that is not ever resolved by the ../src/* files that I checked out and modified.

It's not a huge deal, obviously I've figured out how to at least build/test using my local modifications, but it's certainly odd to me how go get works and the concept of an external pkg for files that are essentially in the same git repository... :)

leftytennis commented 6 years ago

And your last post is correct for my use case, but I suspect that may cause "go get -u github.com/gilbertchen/duplicacy/..." to break?

lowne commented 6 years ago

IIRC I read somewhere that using relative import paths is indeed problematic in Go and should be avoided. Please read the references I provided above; once again:

  1. the code should be under $GOPATH/src/github.com/gilbertchen/duplicacy
    • if you used go get, add the git remote to your fork
    • if you cloned your fork to $GOPATH/src/github.com/yourname/duplicacy, rename the parent folder/move accordingly
  2. use care with, or avoid using, go get
  3. after updating with a vendoring tool, rm -rf vendor/github.com/gilbertchen/duplicacy
  4. go build duplicacy/duplicacy_main.go works
rawtaz commented 6 years ago

I think rustic's build.go is really nice. I don't remember the specifics, but the author of restic explained a number of advantages with it as well (compared to using go get). For the users, this approach is definitely the easiest (in cases where one doesn't want to use go get and don't even have a $GOPATH).

29 sep 2017 kl. 16:37 skrev Mark Lowne notifications@github.com:

IIRC I read somewhere that using relative import paths is indeed problematic in Go and should be avoided. Please read the references I provided above; once again:

the code should be under $GOPATH/src/github.com/gilbertchen/duplicacy if you used go get, add the git remote to your fork if you cloned your fork to $GOPATH/src/github.com/yourname/duplicacy, rename the parent folder/move accordingly use care with, or avoid using, go get after updating with a vendoring tool, rm -rf vendor/github.com/gilbertchen/duplicacy go build duplicacy/duplicacy_main.go works — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

gilbertchen commented 6 years ago

What about moving src to under vendor and also putting all dependencies there?

├── duplicacy
│   └── duplicacy_main.go
└── vendor
    └── github.com
        ├── gilbertchen
        │   └── duplicacy
        │       └── src
        └── other libraries...

Would this work?

Of course now the github.com/gilbertchen/duplicacy directory becomes different from what is hosted by github.com, but would this ever be a problem?

leftytennis commented 6 years ago

does it need to be a separate package? Is there no way to build duplicacy w/o packing all of the ../src/* files in a pkg named duplicacy?

I don't know enough about go to answer whether or not your suggestion would work or not unfortunately. If I update the local vendor/github.com/gilbertchen/duplicacy/src/* files, will go get overwrite local changes with what's on github?

gilbertchen commented 6 years ago

I think this is better:

├── duplicacy
│   └── duplicacy_main.go
└── vendor
    ├── duplicacy
    │   └── source files...
    └── github.com
        └── other libraries...

At least this should not confuse any go dep tools about what github.com/gilbertchen/duplicacy actually contains...

leftytennis commented 6 years ago

What would the import change to in duplicacy_main.go? Would it then be:

"../vendor/duplicacy/*"
gilbertchen commented 6 years ago

It will be just:

import "duplicacy"
gilbertchen commented 6 years ago

I tried using go dep on a test git repository. There are at least 2 issues:

Both issues can be solved by moving vendor/duplicacy to outside of the vendor directory before running the command and moving it back after. And this is only required if you want a reproducible build, so I think this layout is viable.

leftytennis commented 6 years ago

According to this, I don't see why we shouldn't just move all of the ../src/*.go files to the same directory as duplicacy_main.go... seems the build or run commands would then be:

go build *.go
// or
go run *.go

We keep trying to figure out ways to make the current duplicacy pkg fit in, but is the pkg even necessary?

macdanny commented 6 years ago

@gilbertchen Requiring moving directories around doesn't seem like a good idea. Contributors should be able to just check out the code and go. In my world it's critical to have a repeatable build that can be invoked with a single command. It's trivial to do this in the Java space, it's very strange to see a new language like go that is so regressive in this regard.

@jt70471 The (only?) reason you would need it separated is if your repository contained the source to build multiple binaries rather than just one. Duplicacy has just one. It seems your idea has merit -- Peter Bourgon (developer at SoundCloud) advocates for it: http://peter.bourgon.org/go-in-production/

I think it's the most pragmatic solution. It "should" work better than this but since it can very easily be made to work the other way by moving a single file it's hard to see a material downside. The great is the enemy of the good.

In terms of working in a fork, I think @lowne has it exactly right. When I forked duplicacy I did this:

cd $GOPATH/src/github.com/gilbertchen/duplicacy
git remote add macdanny https://github.com/macdanny/duplicacy.git
git checkout macdanny/issue-207-retry-download

I don't see a material downside to working this way.

macdanny commented 6 years ago

go get has a -f argument that seems intended for this use case:

The -f flag, valid only when -u is set, forces get -u not to verify that each package has been checked out from the source control repository implied by its import path. This can be useful if the source is a local fork of the original.

gilbertchen commented 6 years ago

I intentionally separated duplicacy_main.go from src/*.go because the latter was supposed to be a library that can be used by other applications. That is why we need the duplicacy package.

I was almost about to move src to vendor/duplicacy but I now realize this is a bad idea. The only problem this would solve is to make it easier to work on a cloned fork, but if cloning the fork as gilbertchen/duplicacy is the sanctioned way as said by @lowne, then there is really no need to do this.

So I will leave the src directory as it is, and supply Gopkg.toml and Gopkg.lock for vendoring next time I make a new release.

leftytennis commented 6 years ago

Agree that the vendor stuff is good because of the repeatable build aspect it provides.

The src/*.go stuff being a library that could be used by other clients makes sense too, it's just that both client and library are included in a single git repository and due to go's apparent unique build and pkg updating "features" with go get and how packages are imported, it makes it awkward.

One last question/thought and I'll move on... :)

Would it make sense to have a duplicacy git repository that is the client and a separate libduplicacy git repository?

In that context, the import from the client would include a truly separate external pkg, just like every other external pkg used by either the client or library.

I guess it boils down to what you deem most important to the duplicacy community. I would guess that most users will simply download the binary packages you create for each release and there's a much small population of us that want to fork and create pull requests, etc.

Either way, thanks for all of your input and consideration from those of us that are willing to contribute to duplicacy's development.

robbat2 commented 6 years ago

You have Gopkg.{toml,lock} in place now, but something is still not 100%.

D=~/gocode/src/github.com/gilbertchen/duplicacy
git clone https://github.com/gilbertchen/duplicacy $D
cd $D
dep ensure
dep status
go build duplicacy/duplicacy_main.go

Fails:

$ go build  duplicacy/duplicacy_main.go 
# command-line-arguments
duplicacy/duplicacy_main.go:110: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:208: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:320: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:331: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:343: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:353: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:465: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:545: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:604: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:663: context.App.Writer undefined (type *cli.App has no field or method Writer)
duplicacy/duplicacy_main.go:663: too many errors

dep status output:

PROJECT                                  CONSTRAINT     VERSION        REVISION  LATEST   PKGS USED
cloud.google.com/go                      ^0.16.0        v0.16.0        2d3a665   2d3a665  6   
github.com/Azure/go-autorest             *              v9.4.1         c67b24a   c67b24a  4   
github.com/aryann/difflib                branch master  branch master  e206f87   e206f87  1   
github.com/aws/aws-sdk-go                ^1.12.31       v1.12.31       a32b1dc   cd721c9  25  
github.com/bkaradzic/go-lz4              ^1.0.0         v1.0.0         74ddf82   74ddf82  1   
github.com/dgrijalva/jwt-go              *              v3.1.0         dbeaa93   dbeaa93  1   
github.com/gilbertchen/azure-sdk-for-go  ^10.2.1-beta   v10.2.1-beta   2d49bb8   2d49bb8  1   
github.com/gilbertchen/cli               ^1.2.0         v1.2.0         565493f   565493f  1   
github.com/gilbertchen/go-dropbox        branch master  branch master  90711b6   90711b6  1   
github.com/gilbertchen/go-ole            ^1.2.0         v1.2.0         0e87ea7   0e87ea7  1   
github.com/gilbertchen/go.dbus           *              branch master  9e442e6   9e442e6  1   
github.com/gilbertchen/goamz             branch master  branch master  eada9f4   eada9f4  2   
github.com/gilbertchen/gopass            branch master  branch master  bf9dde6   bf9dde6  1   
github.com/gilbertchen/keyring           branch master  branch master  8855f56   8855f56  1   
github.com/gilbertchen/xattr             branch master  branch master  68e7a68   68e7a68  1   
github.com/go-ini/ini                    *              v1.32.0        32e4c1e   32e4c1e  1   
github.com/golang/protobuf               *              branch master  1e59b77   1e59b77  6   
github.com/googleapis/gax-go             *              v2.0.0         317e000   317e000  1   
github.com/jmespath/go-jmespath          *                             0b12d6b            1   
github.com/kr/fs                         *              branch master  2788f0d   2788f0d  1   
github.com/minio/blake2b-simd            branch master  branch master  3f5f724   3f5f724  1   
github.com/pkg/errors                    *              v0.8.0         645ef00   645ef00  1   
github.com/pkg/sftp                      ^1.0.0         1.0.0          98203f5   98203f5  1   
github.com/satori/uuid                   *              v1.1.0         879c588   879c588  1   
github.com/vaughan0/go-ini               *              branch master  a98ad7e   a98ad7e  1   
golang.org/x/crypto                      branch master  branch master  9f005a0   b080dc9  7   
golang.org/x/net                         branch master  branch master  9dfe398   c708664  8   
golang.org/x/oauth2                      branch master  branch master  f95fa95   f95fa95  5   
golang.org/x/sys                         *              branch master  82aafbf   4ff8c00  2   
golang.org/x/text                        *              branch master  88f656f   88f656f  14  
google.golang.org/api                    branch master  branch master  17b5f22   92db9b5  10  
google.golang.org/appengine              *              v1.0.0         150dc57   150dc57  10  
google.golang.org/genproto               *              branch master  891aceb   7f0da29  3   
google.golang.org/grpc                   *              v1.8.0         5a9f7b4   5a9f7b4  21  
gilbertchen commented 6 years ago

https://github.com/gilbertchen/duplicacy/commit/7a7ea3ad182a216d15fad754f330fe65c030a21b should fix this.