golang / mock

GoMock is a mocking framework for the Go programming language.
Apache License 2.0
9.25k stars 608 forks source link

Only rewrite output files if changes are needed. #604

Open sodul opened 2 years ago

sodul commented 2 years ago

Actual behavior We have a fairly large project and running mockgen takes several minutes. During that time all of the generated mock files get deleted and rewritten. This causes 2 issues:

Expected behavior If the output file already exist and has the same content mockgen should not rewrite it and leave it untouched.

To Reproduce Steps to reproduce the behavior

Just run mockgen as usual.

Additional Information

codyoss commented 2 years ago

This seems like a reasonable feature request to me. I would be happy to accept a PR for such a change.

sodul commented 2 years ago

@codyoss gentle ping. There are 2 potential PRs, small ones, that can address this issue.

FYI in our situation we have dozens, if not hundreds of generate mock files. So this change really help with the machines load since it reduce the re-indexing from IDEs, and reduces the background antivirus workload as well.

sodul commented 2 years ago

@codyoss @sanposhiho gentle ping. I'm not sure who have merge permission rights on this repository.

sodul commented 1 year ago

This was merged last month and should be in the final 1.7 release. @codyoss do you have a timeline for that?

amarjeetanandsingh commented 1 year ago

Currently we are generating the mock and then compare if the generated content is already in the mock file. We are saving only the write efforts and it’s implications and not generation effort. I think there is a scope of improvement. I understand there will be challenges but can we consider below 2 approaches for optimisation?

@codyoss @sodul If considerable, I’ll be happy to implement.

sodul commented 1 year ago

@amarjeetanandsingh I'm not an official maintainer, I just contributed a PR here.

That said something to consider about adding timestamps inside files is that this makes them mutable which breaks build reproducibility and breaks tools such as Bazel which expect that same input results in identical output. Timestamps, or git hashes are really bringing unwanted side effects. Furthermore our own internal CI pipeline run mockgen on every PR and file the pipeline if we detect a diff, so we definitely would not want to introduce 'random' data (time is random) in the generated files.

While a hash of the content has the advantage of being more consistent when re-run, I'm not sure that would help much. You want a way to uniquely identify the input data, which would take a while to compute, by that point generating the output in memory is probably cheap to do. The current code will now check if the content of the generated file is different before reading it do do a in memory diff, so there is already an optimization on writing here.