Only rewrite output files if changes are needed.

sodul commented 2 years ago

Actual behavior We have a fairly large project and running mockgen takes several minutes. During that time all of the generated mock files get deleted and rewritten. This causes 2 issues:

if we run the mocks in parallel mockgen can try to read the files while they are being rewritten, causing mockgen to fail as it was trying to read the file. We worked around this by targeting directories in parallel but run mockgen sequentially within a directory.
our IDEs see the files being modified and re-index the codebase over and over, making the IDE mostly unusable while mockgen is running.

Expected behavior If the output file already exist and has the same content mockgen should not rewrite it and leave it untouched.

To Reproduce Steps to reproduce the behavior

Just run mockgen as usual.

Additional Information

gomock 1.6.0
golang 1.17.3

codyoss commented 2 years ago

This seems like a reasonable feature request to me. I would be happy to accept a PR for such a change.

sodul commented 2 years ago

@codyoss gentle ping. There are 2 potential PRs, small ones, that can address this issue.

FYI in our situation we have dozens, if not hundreds of generate mock files. So this change really help with the machines load since it reduce the re-indexing from IDEs, and reduces the background antivirus workload as well.

sodul commented 2 years ago

@codyoss @sanposhiho gentle ping. I'm not sure who have merge permission rights on this repository.

sodul commented 1 year ago

This was merged last month and should be in the final 1.7 release. @codyoss do you have a timeline for that?

amarjeetanandsingh commented 1 year ago

Currently we are generating the mock and then compare if the generated content is already in the mock file. We are saving only the write efforts and it’s implications and not generation effort. I think there is a scope of improvement. I understand there will be challenges but can we consider below 2 approaches for optimisation?

We can store the last modified UTC time of the actual file inside the generated file and use it to compare it on next mockgen trigger.
Another approach can be to store a fast hash of the actual file content inside the mock file and compare it on next mockgen trigger

@codyoss @sodul If considerable, I’ll be happy to implement.

sodul commented 1 year ago

@amarjeetanandsingh I'm not an official maintainer, I just contributed a PR here.

That said something to consider about adding timestamps inside files is that this makes them mutable which breaks build reproducibility and breaks tools such as Bazel which expect that same input results in identical output. Timestamps, or git hashes are really bringing unwanted side effects. Furthermore our own internal CI pipeline run mockgen on every PR and file the pipeline if we detect a diff, so we definitely would not want to introduce 'random' data (time is random) in the generated files.

While a hash of the content has the advantage of being more consistent when re-run, I'm not sure that would help much. You want a way to uniquely identify the input data, which would take a while to compute, by that point generating the output in memory is probably cheap to do. The current code will now check if the content of the generated file is different before reading it do do a in memory diff, so there is already an optimization on writing here.

golang / mock

Only rewrite output files if changes are needed. #604