Closed traveltissues closed 5 years ago
Note that there's a related but less-comprehensive proposal in https://github.com/bazelbuild/remote-apis/pull/54.
I'm not sure that adding the capability to support arbitrary properties is a great idea--for example, adding mtime is, I think, likely to lead to cache fragmentation from files that are content-identical but produced on different machines via a hermetic process. There's clearly a general desire for better permissions handling--at least increased clarity, and possibly more flexibility.
I'm curious to hear what others think of file/node-specific properties--what would be useful vs. overly complex, whether a general mechanism is necessary, etc.
for example, adding mtime is, I think, likely to lead to cache fragmentation from files that are content-identical but produced on different machines via a hermetic process.
I agree that we generally shouldn't add timestamps. The cache fragmentation would only affect Directory protos, not actual file blobs, but it's still not ideal.
However, in BuildStream where we build a whole module in a single Action using a traditional build system (e.g., autotools, cmake), we have a feature to support incremental builds. This currently only works for local builds but we'd like to support it also with remote execution. The way this works is that we use the previous buildtree + the modified sources as input tree for the incremental build. Unfortunately, with traditional build systems it's crucial to properly support file mtime, otherwise too little or too much will be rebuilt.
We won't use file mtime for regular builds, however, for incremental builds the improved user experience is worth the Directory proto cache fragmentation.
I don't expect Bazel or other REAPI clients that work with fine-grained actions to be interested in file mtimes. However, we'd still like to support this for the optional incremental build feature in BuildStream and at least one REAPI server. More clients might be interested in additional node properties, though, e.g., for permissions. That's why we thought it might be sensible to add a string-based extension mechanism that would provide implementations the flexibility for such features. If one or multiple such extensions turn out to be more generally useful, we can consider standardizing them, similar to platform properties.
As long as we can distinguish between unset - "timestamp not provided" / "mode not provided" - and zero - "timestamp is explicitly Jan 1 1970" / "mode is 000" - properties (which this proposal definitely does), and we keep everything unset where not explicitly required, I'm in favour of adding something to the protocol.
Basically, there are use-cases where we need highly cacheable data, which is generally restricted to inputs we want to get cache hits on. For those, we definitely don't want any extraneous data mixed in, and would like the default to be for it never to be specified - but even there, metadata is sometimes desired. (A previous example was bazel itself - it has requirements about timestamps of its own files, and so cannot be passed pre-extracted as an input to an action).
Past that, today's Directory protos are only used for outputs, where it's generally less relevant, and are simply unable to represent some things. This precludes reuse for any use-case where this metadata is material. Jürg provides one example; it also came up in discussions around a generalized Fetch API https://docs.google.com/document/d/10ari9WtTTSv9bqB_UU-oe2gBtaAA7HyQgkpP-RFP80c/edit. I'd prefer to expand Directory to cover these natively, and specify in the proto where usage of a Directory proto must be more limited by banning metadata, than to push such use-cases to have a mirrored proto that has these fields.
I'm torn on representing these as general properties vs a small number of known fields. modified-time feels like it should be a standardized and cross-platform field. Permissions are platform-dependent, which argues for general. Other properties could be even more custom, though should hopefully be used rarely, and standardized if/when they are? But on net, I lean towards 'general', with well-defined known keys and their value formats.
On Thu, Aug 15, 2019 at 10:26 AM Jürg Billeter notifications@github.com wrote:
for example, adding mtime is, I think, likely to lead to cache fragmentation from files that are content-identical but produced on different machines via a hermetic process.
I agree that we generally shouldn't add timestamps. The cache fragmentation would only affect Directory protos, not actual file blobs, but it's still not ideal.
However, in BuildStream where we build a whole module in a single Action using a traditional build system (e.g., autotools, cmake), we have a feature to support incremental builds. This currently only works for local builds but we'd like to support it also with remote execution. The way this works is that we use the previous buildtree + the modified sources as input tree for the incremental build. Unfortunately, with traditional build systems it's crucial to properly support file mtime, otherwise too little or too much will be rebuilt.
We won't use file mtime for regular builds, however, for incremental builds the improved user experience is worth the Directory proto cache fragmentation.
I don't expect Bazel or other REAPI clients that work with fine-grained actions to be interested in file mtimes. However, we'd still like to support this for the optional incremental build feature in BuildStream and at least one REAPI server. More clients might be interested in additional node properties, though, e.g., for permissions. That's why we thought it might be sensible to add a string-based extension mechanism that would provide implementations the flexibility for such features. If one or multiple such extensions turn out to be more generally useful, we can consider standardizing them, similar to platform properties.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/remote-apis/issues/90?email_source=notifications&email_token=AABREW2QHGNLUJUXZGGTKNDQEVRTDA5CNFSM4IL523X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4L6S2Y#issuecomment-521660779, or mute the thread https://github.com/notifications/unsubscribe-auth/AABREWYPPXWTAUZZ4JWMBTDQEVRTDANCNFSM4IL523XQ .
I'm torn on representing these as general properties vs a small number of known fields. modified-time feels like it should be a standardized and cross-platform field. Permissions are platform-dependent, which argues for general.
I would lean towards a general property usually and allow the server to implement the supported keys. I'm not sure there's an added value to specific field support
The current use-cases of Directory protos are action inputs and action outputs, and I think we should outline these explicitly w.r.t to what the additional metadata should be and how is it to be interpreted (Eric's proposal https://docs.google.com/document/d/10ari9WtTTSv9bqB_UU-oe2gBtaAA7HyQgkpP-RFP80c/edit?ts=5d433a83# adds other use-cases, and when it goes through, we will need to address these similarly as well).
For example:
Some other comments:
On Thu, Aug 15, 2019 at 11:34 AM Darius Makovsky notifications@github.com wrote:
I'm torn on representing these as general properties vs a small number of known fields. modified-time feels like it should be a standardized and cross-platform field. Permissions are platform-dependent, which argues for general.
I would lean towards a general property usually and allow the server to implement the supported keys. I'm not sure there's an added value to specific field support
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/remote-apis/issues/90?email_source=notifications&email_token=AGFAVOEUK6SFXWG5O2PO5VLQEVZP7A5CNFSM4IL523X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4MEVLA#issuecomment-521685676, or mute the thread https://github.com/notifications/unsubscribe-auth/AGFAVOGELVNFOJB2RQ2ZKLTQEVZP7ANCNFSM4IL523XQ .
Apologies, I missed your response.
The current use-cases of Directory protos are action inputs and action outputs, and I think we should outline these explicitly w.r.t to what the additional metadata should be and how is it to be interpreted (Eric's proposal https://docs.google.com/document/d/10ari9WtTTSv9bqB_UU-oe2gBtaAA7HyQgkpP-RFP80c/edit?ts=5d433a83# adds other use-cases, and when it goes through, we will need to address these similarly as well).
That's fine, I'll read the proposal and update here.
- The properties need to be sorted, for canonical Directory representation.
Agreed.
- Do we want to make is_executable part of metadata? Logically, it does belong there, so will be a bit cleaner, and then we can remove the field in v3. On the other hand, it will make the messages referencing executables more verbose, particularly if we want to optionally add the properties to outputs.
I have no strong feelings on that. On one-hand if file properties are supported then is_executable becomes redundant and could be replaced with a reference to those although I wouldn't want to break functionality.
@ola-rozenfeld I've made some updates to #91. Does this address some of your concerns? In terms of what the output should be I intend for properties to be returned as part of the message. In terms of RE, I'd agree that the server should set the properties of the files where possible (and there will need to be an update to #91) for this.
I've added a few comments to #91.
- Output metadata: either we say the server will not add any output metadata, or we need to add a message to Command, e.g. OutputMetadata with repeated string property_name, that the server needs to return for every output.
This is still missing if I haven't overlooked anything.
In order to preserve file properties such as timestamps and permission bits as metadata, I propose to add the concept of
NodeProperties
as repeated key/values to FileNodes, DirectoryNodes, and SymlinkNodes. This would allow servers to enrich directory trees and is similar to the Platform properties in #38. For example in directory proto: