erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.15k stars 2.92k forks source link

add possibility to preserve source when using deterministic builds #8607

Closed delitrem closed 5 days ago

delitrem commented 1 week ago

configure: add --enable/disable-deterministic-keep-source build option compiler: add +deterministic_keep_source option compiler: make some minor additions for +deterministic option description

github-actions[bot] commented 1 week ago

CT Test Results

    4 files    456 suites   56m 9s :stopwatch: 2 240 tests 2 191 :white_check_mark: 49 :zzz: 0 :x: 7 642 runs  7 568 :white_check_mark: 74 :zzz: 0 :x:

Results for commit 4044e9bb.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

delitrem commented 1 week ago

Related to #8602.

delitrem commented 1 week ago

Oops. Some tests are failing — I will take a look a little bit later.

Anyway is it a good or bad idea of proposed changes?

michalmuskala commented 6 days ago

I think overall a better option would be for the compiler to be able to emit relative paths in source (and -file attributes). Most of the time this should be good enough and will actually be deterministic as well. This could potentially solve your problem and another that today when using the deterministic option all included files are saved as just the basename, which might be ambiguous. The only complexity is that usually most projects will have at least 2 "roots" of paths - the project itself & OTP, so there needs to be some nice way of separating the two.

delitrem commented 6 days ago

I think overall a better option would be for the compiler to be able to emit relative paths in source (and -file attributes). Most of the time this should be good enough and will actually be deterministic as well.

Hello and many thanks for the feedback. Could you be so kind to provide a little bit more details? If I understand you correctly, your solution seems to be more simple than mine.

For example if meta data of some file (let it be ssh.erl from ssh application), which is compiled without deterministic option looks like:

1> ssh:module_info(compile).
[{version,"8.4.3"},
[...]
 {source,"/tmp/guix-build-erlang-27.0.drv-0/source/lib/ssh/src/ssh.erl"}]

How should it look when the file is compiled with deterministic option?

michalmuskala commented 6 days ago

I think there's a bit of scope to design some options. But one simple could be to have the path relative to either the "repository root" for all files, in this case lib/ssh/src/ssh.erl or "application root", in this case ssh/src/ssh.erl.

The more advanced proposal that I mentioned would allow specifying custom "replacement roots" making the paths largely logical, rather than physical. Some example of this could be compiling the following file:

% /home/micmus/projects/foo/src/test.erl
-module(test).
-include_lib("kernel/include/file.hrl").

with a command (some rough proposal, subject to proper design)

erlc --shorten-path="/path/to/otp=//otp" --shorten-path="/home/micmus/projects/foo=//app" test.erl

we'd end up with the source field as you shown above set to //app/src/test.erl and the AST of:

-file("//app/src/test.erl", 1).
-module(test).
-file("//otp/kernel-9.2/include/file.hrl", 1).
% contents of the header
-file("//app/src/test.erl", 3).

today with the +determinstic option we end up with:

-file("test.erl", 1).
-module(test).
-file("file.hrl", 1).
% contents of the header
-file("test.erl", 3).

where both test.erl and file.hrl are highly ambiguous - less problematic with .erl files since duplicates are not allowed, but headers might be duplicated in the source trees.

By keeping the option fairly generic just replacing specified paths, I think this could serve several use cases