bedeho commented 5 years ago

Add your suggestion as a comment!

Background

Our Substrate code base is starting to get more complicated, and it would be a benefit to harmonise the set of major conventions we follow, so as to follow good best practices, and make reviews more efficient. The goal of this issue is to accumulate suggestions over time, as replies, which we can turn into an eventual convention document. This document can further be turned into rules for our CI linter.

Major questions that

How to decide what is its own module, vs. combining with existing module?
Modules deserve their own repo? every module?

Initial suggestions

All maps must map to Option to avoid default construction behaviour of StorageMap from allowing us to be lazy about checking ::exists on the same map before lookup.
Always try to make a module easily reusable for another runtime, by:
- If possible, always provide your own traits for your expectations on other modules, rather than relying on public traits, or traits in the module you are expecting to use.
- Don't define and implement traits for your own module, see point above.
- If possible, strive to implement the business logic of your module separately from the runtime module itself, in a substrate agnostic way.
Assert as many invariants as possible!

bedeho commented 5 years ago

Avoid containers in Structs, e.g. Vec, as they will make reading any struct property securely require that the client downloads all struct content, including the potentially large container payload. Instead, move containers to top-level storage, and place in suitable map with identifier matching original Struct.

jfinkhaeuser commented 5 years ago

Avoid containers in Structs, e.g. Vec, as they will make reading any struct property securely require that the client downloads all struct content, including the potentially large container payload. Instead, move containers to top-level storage, and place in suitable map with identifier matching original Struct.

I do agree with this. But the clear downside of this is that you have two data sinks that need to be kept in sync manually. It's IMHO important to point out here that you're buying performance at the cost of introducing more complexity developers need to manage.

bedeho commented 5 years ago

I very much agree, and I do not like it. In general, I would be in favor of the opposite actually, to sacrifice performance for reducing the invariant count.

Why don't we make it that something people should just keep in mind at least, perhaps be more sensitive to it if the container can grow very large? In many cases, it will in practice be very small, in which case its an even worse tradeoff to have this rule.

bedeho commented 5 years ago

Try to avoid implementing deletion or moderation operations from state by doing actual removal from storage. Removal from the state often will require iteration over possibly unbounded state lengths and will require the introduction of extra state variables to make it practically feasible under even normal circumstances. Failure to deal with this properly leaves the platform open to DoS attacks which transaction fees are not an effective measure against.

Don't delete, only hide!

credit to @siman

jfinkhaeuser commented 5 years ago

Don't delete, only hide!

Luckily, deleting an ID from a Vec hides the associated entry quite well.

But this is actually not to do with deletion as such. Your situation is one where one top-level item "contains" a hierarchy of descendants that might all have to be modified. Deletion is actually the simplest modification you can make, you just "unlink" the top-level item from something like the above list of IDs.

Much more complex would be operations where each of the descendants needs to be updated, but kept visible. Or where there is no single, but multiple parent elements.

TL;DR, "don't delete, only hide" is a good start, but I think there's a whole class of issues lurking here.

bedeho commented 5 years ago

Any state that applications and end users are too read, must be securely readable, and thus must be part of the state. While an indexing node could on occasion generate such state by tracking events and/or transaction inclusion, this would not yield light client proofs that the end user could verify. While its tempting to conclude that providing a collection of simple transaction inclusion proofs could be sufficient, it is not, as it does not avoid the problem of censored transactions.

bedeho commented 5 years ago

Reply to the following prior idea on always wrapping value type of StorageMap in option.

All maps must map to Option to avoid default construction behaviour of StorageMap from allowing us to be lazy about checking ::exists on the same map before lookup.

While this does prevent the lazyness described, it also now introduces the possibility that one may incorrectly introduce None value, which should not even be a possible state. Recall that the original problem was not that the map contained a None value, it was that a sloppy developer may neglect to check whether some key actually is in the map.

The cost may not be worth the cure, and it also makes the state less self-describing in order to solve a very basic error.

bedeho commented 5 years ago

The state of a module should be constructed in such a way that it is possible for a light client to securely interrogate it in a practical way. This will almost always involve adding some additional state variables which functionally depend on the rest of the state. This does involve introducing an additional runtime invariant (which is evil), but the objective is sufficiently important to warrant it.

Example: There is relationship one to many relationships A->B for types A, B. For a light client to know that it had all instances of B which corresponded to some instance of A, it would have to get all B instances to avoid the risk of omission by a full or indexing node. The alternative is for A to include a counter field for this relationship. This way, if a light client is given the right number of B instance inclusion proofs, it will know that it knows the relationship fully.

jfinkhaeuser commented 5 years ago

All maps must map to Option to avoid default construction behaviour of StorageMap from allowing us to be lazy about checking ::exists on the same map before lookup.

While this does prevent the lazyness described, it also now introduces the possibility that one may incorrectly introduce None value, which should not even be a possible state. But that's ok - if a None value is stored in the map, it is identical to not storing any value. In fact, that makes it possible to use storing a None value to effectively delete data (I see the discussion about not deleting, but as discussed there, it's a fairly special case).

IMHO using Option solves more problems than it introduces. That's especially true because writing state is done via extrinsics, so under the close control of runtime developers, whereas reading state is the thing that could go wrong without Option.

jfinkhaeuser commented 5 years ago

The state of a module should be constructed in such a way that it is possible for a light client to securely interrogate it in a practical way. This will almost always involve adding some additional state variables which functionally depend on the rest of the state. This does involve introducing an additional runtime invariant (which is evil), but the objective is sufficiently important to warrant it.

Example: There is relationship one to many relationships A->B for types A, B. For a light client to know that it had all instances of B which corresponded to some instance of A, it would have to get all B instances to avoid the risk of omission by a full or indexing node. The alternative is for A to include a counter field for this relationship. This way, if a light client is given the right number of B instance inclusion proofs, it will know that it knows the relationship fully.

What's wrong with this?

map A => Vec<B>
// or
map IdOfA => Vec<B>
// or, if B is already in it's own collection even
map IdOfA => Vec<IdOfB>

Definitely simpler than manually maintaining extra state. Vectors already contain the counter you're looking for.

jfinkhaeuser commented 5 years ago

Somewhat related to avoiding containers in structs, and based on some discussion in https://github.com/Joystream/joystream/pull/45

As a rule, I would avoid creating multiple maps of the same ID to different data, and put the data into a struct. That is, generally prefer this:

struct Foo { A, B }
map Id => Foo // or Option<Foo>, see above

to

map Id => A // Option<A>
map Id => B // Option<B>

This should be the rule unless one of the following holds true:

There is only an indirect relationship between A and B, even if the Id is the same. An example would be that ContentId maps to DataObject and ContentMetadata, but the first can sanely exist without the second in the state (though only having both makes content discoverable).
There is a clear use-case for processing large numbers of A without having to refer to B. This is an optimization as discussed in the "Don't store containers in structs" case, and should be an optimization rather than the rule.

bedeho commented 5 years ago

The state of a module should be constructed in such a way that it is possible for a light client to securely interrogate it in a practical way. This will almost always involve adding some additional state variables which functionally depend on the rest of the state. This does involve introducing an additional runtime invariant (which is evil), but the objective is sufficiently important to warrant it. Example: There is relationship one to many relationships A->B for types A, B. For a light client to know that it had all instances of B which corresponded to some instance of A, it would have to get all B instances to avoid the risk of omission by a full or indexing node. The alternative is for A to include a counter field for this relationship. This way, if a light client is given the right number of B instance inclusion proofs, it will know that it knows the relationship fully.

What's wrong with this?
map A => Vec<B>
// or
map IdOfA => Vec<B>
// or, if B is already in it's own collection even
map IdOfA => Vec<IdOfB>
Definitely simpler than manually maintaining extra state. Vectors already contain the counter you're looking for.

The main problem here is that now I cannot get light client proof of a single entry in the map, I have to download entire map to be sure a single entry exists, that's not viable.

jfinkhaeuser commented 5 years ago

The main problem here is that now I cannot get light client proof of a single entry in the map, I have to download entire map to be sure a single entry exists, that's not viable.

Maybe you have to expand on this problem, because as I read it, this approach actually solves your issue.

jfinkhaeuser commented 5 years ago

Ah, I think I understand. You want to know a specific A:B relationship exists?

jfinkhaeuser commented 5 years ago

So... if that's the case, then you're right - there's a simple enough solution, though, and that'd be:

map (IdOfA, IdOfB) => bool

I honestly think, though, that this is something of a premature optimization for most cases, and I wouldn't want to raise this to a general rule. If you need to prove something like this, sure, it makes sense.

But consider that every state access from a light client is effectively a WebSocket request, potentially through SSL. That is, you have computational and I/O overhead per access. The moment you want to prove that both A:B1 and A:B2 exist (or more precisely at N such proofs), your savings go out of the window because the per-request overhead is higher than the few hundred bytes you need to transfer.

bedeho commented 5 years ago

Ah, I think I understand. You want to know a specific A:B relationship exists?

Securely, yes.

bedeho commented 5 years ago

So... if that's the case, then you're right - there's a simple enough solution, though, and that'd be:
map (IdOfA, IdOfB) => bool
I honestly think, though, that this is something of a premature optimization for most cases, and I wouldn't want to raise this to a general rule. If you need to prove something like this, sure, it makes sense.

But consider that every state access from a light client is effectively a WebSocket request, potentially through SSL. That is, you have computational and I/O overhead per access. The moment you want to prove that both A:B1 and A:B2 exist (or more precisely at N such proofs), your savings go out of the window because the per-request overhead is higher than the few hundred bytes you need to transfer.

We don't need to introduce this sort of state explicilty, the point of storage maps/lists/values ... is that they get encoded in a storage trie in a way that makes generating proofs on membership a direct property of the encoding. The solution that I implicitly had in mind, but perhaps should have written out, is

// Types

struct A {
id: AIdType,
number_of_Bs: u32
}

struct B {
id: BIdType
id_of_A: AIdType
}

// Storage
allAs: map AIdType => A
allBs: map BIdType => B

Now, if a light client knows an a:A, and gets proofs p_1,...p_{a.number_of_Bs}, it can be certain that is knows about all Bs in the relationship. Now, obviously, a.number_of_Bs is redundant on chain state, you can derive from allBs, but for a light client its required, otherwise it cannot detect omission. The only other alternative is to download all of allBs.

Finally, if you chose representation BsRelatedtoAinRelationshipFoo: map A => Vec<B>, then you lose this property of downloading a single B securely, because the encoding just makes Vec<B> into a blob. Likewise if you do map A|AIdType=>Vec<BIdType>, then you cannot be securely convinced that a single B on its own relates to A, unless you also add this to B itself, like in the sketch in ths reply... which sort of makes this map redundant.

I hope its clear that this is not an optimization.

jfinkhaeuser commented 5 years ago

I do understand your reasoning, though I think you're basing it on light client APIs that I haven't seen yet - maybe I missed them, of course.

The APIs I've dealt with have a couple of hundreds of Bytes of HTTP headers in overhead for every state query, and there does not seem to be a query for proofs of state. Which would mean your scheme would require N * (HTTP Overhead + sizeof(B)), while mine might use HTTP Overhead + (N * sizeof(B)).

There is no guarantee that my scheme ends up more efficient, mind you. I'm arguing that your scheme is effectively an optimization for when sizeof(B) outweighs N * HTTP Overhead. Which brings me to my point, it's hard to gauge that your scheme is more efficient without looking at each concrete case, whereas it's easy to gauge that it introduces more state to manage.

All of this goes away, of course, if we find a light client API that lets you batch query proofs only. I just didn't see it (yet).

bedeho commented 5 years ago

This has nothing to do about the full node API, it's a conceptual point, and it is not related to query overhead. It has to do with what one can and cannot know securely under different choices of state encoding, all under the assumption that light client proofs are available. We currently do not use them in our testnets, as everyone can trust our hosted full node, so that may explain why you have not seen any proofs in the queries you have looked at.

I think we have reached the limit of what can be resolved in a Github comment thread anyway, so let's resolve out of band :D

bedeho commented 5 years ago

All storage fields should be configurable, either explicitly, or built, for the following two reasons

In line with the goal of having modules be standalone, a third party developer should conveniently be able to launch a new chain where the initial state of this module is configured in some arbitrary way.
Makes it easier to write test cases which cover a broad set of initial states.

bedeho commented 5 years ago

We should avoid using usize, e.g. as a type field, parameter etc., as the actual size is machine dependant, which means it could have disastrous consenceunes for consensus if different validators build from source on different target types.

https://doc.rust-lang.org/std/primitive.usize.html

The size of this primitive is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes.

Some rust types require us to deal with it, such as Vec::len, however, we should make an effort to not cast to/from it as much as possible.

bedeho commented 5 years ago

We should stick to the "one event per extrinsic" rule, where the purpose of an event is to signal a high-level state transition (which may be complex), rather than whether a specific part of the state is impacted or not.

Example

Multiple parts of the state may be changed as part of a single extrinsic, e.g. adding a new thread may involve also introducing a new initial first post, where adding a post normally would be its own separate event associated with an extrinsic explicitly for this purpose. In

Rationale

Runtime code will eventually get very complex, and it will be very easy to forget to synch the set of events associated with a given extrinsic as the state gets more complex, and the extrinsic itself becomes more complex. It also will require more blockspace, taking away throughput and history storage space.

It's better to offload the burden of listening to the right high-level events, and understanding the implications about what part of the state is changed, onto off chain client applications, which then have to make sure to subscribe to the appropriate set of relevant events. So in the example, a client application would have to know that, a new post may be introduced into the forum also as a consequence of a new thread, not only directly adding a post on its own.

jfinkhaeuser commented 5 years ago

I don't think the "one event per extrinsic" rule should be a rule, more of a guideline.

In general, I agree with the rationale. But the rationale only works as long as extrinsics do one thing only. That's largely a good idea to follow, too, so most of the time this rationale would probably apply.

The downside is UX. We're already at a point in uploading files where pioneer has to call several extrinsics for an upload, and if it does so in sequence - as the design of the extrinsics sometimes requires - this will take several block generation periods to finish. Since each period takes 6s currently, that'll leave a simple upload doing pretty much nothing for 12+ seconds - terrible UX, in other words.

We've partially circumvented this issue by having an extrinsic do more than one thing, but that's not the best of design... at least not until you divide extrinsics into higher- and lower-level operations. Low-level operations should definitely do one thing only, and generate at most one event.

But there is a strong UX argument for introducing higher-level extrinsics that effectively wrap lower-level ones (in practice, use traits to execute the same code as several lower-level ones rather than invoking the extrinsics directly, but that's sort of an implementation detail).

The example here with uploading files would be to create a DataObject for uploading the actual file data to a Liaison, and simultaneously adding unpublished/draft metadata to the state referencing that DataObject. Saving a 6+s wait period is very much better for the UX, and almost certainly worth having an extrinsic emit two separate events for the two seperate sub-steps it performs.

TL;DR distinguish between high-level and low-level extrinsics; the former are for UX and may emit multiple events, the latter are for module APIs and should emit at most one.

bedeho commented 5 years ago

Use linked_map which has an iterable key set, rather than map. While this map type also allows on chain iteration, we primarily should use it for off-chain iteration.

The only penalty is storing the extra links, which is probably O(#keys), so if the use case for the map may have a lot of entries, it could be worth reconsidering, however, lookup times are unaffected.

We are in some modules currently actively storing redundant vectors of keysets in the state, the need for this would also disappear. While some sort of indexing service could possibly evade the need for this, simple apps driven directly by talking to a full node would need this extra state to function if there is no linked map.

bedeho commented 5 years ago

When writing tests, set up the initial state of interest by explicitly configuring it, not through a sequence of simulate transaction calls.

bedeho commented 5 years ago

Library code, tests, mocks and fixtures, should live in separate files.

bedeho commented 5 years ago

All modules should have their own folder, or be in a folder that colocates a set of related modules.

bedeho commented 5 years ago

Cargo.lock should not be tracked in library repos, but should in app/binary repos.

https://github.com/rust-lang/cargo/issues/315

bedeho commented 5 years ago

My hunch is that for modules without any extrinsics, there should also be no events. Any client module should have its own specific events. This seems cleaner, and also avoids duplication of events which would otherwise fredentuly be the case.

This also opens up another question, what about the possible need for events in response to on_finalized processing. I think the symmetric treatment there is instead to have appropriate callback functions that are called to signal something of interest, and then a user can layer events on top, which again may be more complex. The callback could be a global one configured through the module Trait configuration, or it could be supplied by a user call and stored.

bedeho commented 5 years ago

Structs should not have identifier information, keep it in the container (e.g. map). This redundant state can really get you into trouble if it gets out of synch with the container key.

bedeho commented 5 years ago

If a module does not plausibly constitute a candidate for third party runtimes, then all trait dependencies configured should be using third party traits that are introduced in scope through a direct dependency on the relevant crate/package.

bedeho commented 5 years ago

`mutate` over `insert` for updates

If you are updating a value that exists, either in a map or a top-level storage variable, then try to use the mutate method rather than get/insert|write, it is semantically clearer for the reader. Parity confirms there is no performance benefit.

bedeho commented 5 years ago

Granular errors for extrinsic and public methods:

Attempt to use as fine-grained a resolution in the error type returned for any method public method, be it dispatchable or not.

For strings, this means not collapsing lots of different issues into one literal case, e.g. we prefer CannotAddWithLowBalance, CannotAddWithoutSignature, CannotAddOnOddBlockNumber over a single CannotAddNow.

For type-safe responses, this means covering all possible cases as different genuine states, e.g. by fully loading enums with parameters etc.

This all contributes to the writing of a deeper unit test coverage of error cases, as it is clear that certain execution flows are not being covered.

JoshOrndorff commented 5 years ago

I don't think the "one event per extrinsic" rule

I agree

But there is a strong UX argument for introducing higher-level extrinsics

In some cases it's more that just UX. In my tictactoe game I previously had take_turn and claim_win as two separate extrinsics, each of which emitted a single event. The problem is that taking a final move and claiming a win should happen atomically. If they don't I can make a winning move, but my opponent can take an additional move and claim the win before me by front-running.

The solution in my case was to have a single higher-level extrinsic take_winning_turn which does both atomically. It's the same solution described above, but I wanted to point out that it's more than just UX.

JoshOrndorff commented 5 years ago

My hunch is that for modules without any extrinsics, there should also be no events.

I see what you're saying, but I disagree. Checkout my simple feedback module. It has no dispatchable calls (eg extrinsics) but it does have a call that can rate that can be made from other runtime modules that compose with it. The reputation system doesn't make sense to use on its own, only in conjunction with other modules. But I still emit an event when a rating is placed.

I'm open to suggestions if you think there's a better design here.

Also hope you don't mind me jumping in the middle of your convo. Just thought there were lots of good ideas floating around here.

bedeho commented 5 years ago

Hi @JoshOrndorff , thank you for weighing in, we appreciate it.

The solution in my case was to have a single higher-level extrinsic

Why does the need for such a single high level extrinsic, which I agree with, imply that you cannot follow the suggested one event rule? The original post suggests that in this context, there would be a new event, e.g. named WinningTurnTaken, and that any observer of this module should understand that semantically this involves both a turn and claiming a win.

I guess I am not seeing the problem? If you take the opposite approach, of now having this extrinsic emit two events, e.g. called TurnTaken and Won, then you are moving in the direction of the problem described. Now, in this particular module it may end up not being very severe, in particular, if this is the only case, but the problem described becomes increasingly real when the module grows in complexity, and one keeps long term maintainability by a diverse set of participants in mind.

I'm open to suggestions if you think there's a better design here.

I think its no big problem to emit an event like that, but in practice I think its going to be duplicating effort here, because as you say, this action only makes sense in the context of a broader runtime. So in a particular runtime, there may be all sorts of extra information you may want to combine with such a rating notification, and there is no way of injecting that into the lowest level event you have coupled to your rating system. This means one will end up with two events for the same semantic concept, which has downsides in terms of resources and clarity to developers working on off-chain applications consuming these events.

What do you think? Do you see these as real downsides, if so, are there counterbalancing upsides?

bedeho commented 5 years ago

Don't use `Default` trait if you are forced to impl

Due to various artifactual requirements induced by Substrate, we can at times be required to implement the Default trait for various types, even when there is no well-defined meaning to default construction.

Specifically, a common reason is that Substrate map storage encoding is overly helpful in trying to give back default constructed values when the lookup key does not live in the map. In my mind, this is not a reasonable design decision, but it seems deliberate, so it may be a permanent feature.

In those cases, do not use this trait in our code, as it gives the misleading indication that we need this property for our own purposes. One day we will hopefully drop them. Instead, mark the implementation clearly as not being permanent, e.g. here

/// OpeningStage must be default constructible because it indirectly is a value in a storage map.
/// ***SHOULD NEVER ACTUALLY GET CALLED, IS REQUIRED TO DUE BAD STORAGE MODEL IN SUBSTRATE***
impl<BlockNumber> Default for ApplicationStage<BlockNumber> {
    fn default() -> Self {
        ApplicationStage::Active
    }
}

mnaamani commented 5 years ago

Don't use Default trait if you are forced to impl

So I don't disagree with the suggestion and I'm certainly aware of this annoying side affect, but of course I have been forced to implement the Default trait for types that I know have no real meaning to be default constructed, but I failed to make a point about it in the code comments.

So I think your suggestion is appropriate for the value type in storage maps (because we are choosing to avoid wrapping it in an Option just to get around this artifact)

I'll just add that when we do inevitably add a Default trait wether explicity as above or by deriving it the same issues spreads to the fields of the type.

The best approach for fields where again the default value isn't meaningful, we should try to make them Options at the same time trying to avoid allowing possible "invalid" states when there is inter-dependence between fields. In the following example Type is a value in a map and so must have a default value:

If it doesn't make sense for a to be Some and b to be None simultaneously, (an invalid state) in this scenario:

struct Type {
     a:  Option<A>,
     b:  Option<B>,
}

We want to avoid introducing a default trait for A which would happen if we turn it into:

struct Type {
     a:  A,
     b:  Option<B>,
}

So instead do:

struct Type {
     c:  Option<(A, B)>
}

bedeho commented 5 years ago

Do not overload return types

NB: See updated idea below.

Lets say you have a public method in your module, or in the future this would also apply to extrinsics when success results can be any type:

pub fn my_public_module_method(..) -> Result<MY_SUCCESS_TYPE, MY_ERROR_TYPE> {
...
let good_outcome = Self::ensure_something(...)?;

}
...
fn ensure_something(...) -> Result<SOME_OTHER_SUCCESS_TYPE, MY_ERROR_TYPE> {
.... 
}

Unless it is genuinely the case that ensure_something can fully return all errors represented by MY_ERROR_TYPE, this should be avoided. This is tempting, as its easy and fast, and its is in effect what we are currently doing for extrinsics since the global error type is always &'static str. However, this is not safe, as ensure_something can at some point mistakenly return an invalid value.

Instead, if the error domain of ensure_something is different, even trivially, then introduce a new explicit representation ENSURE_SOMETHING_ERROR_TYPE, and use the std::convert::From trait to allow seamless conversion on the fly.

Its slightly more work, but compile time safety in the runtime is worth almost any level of one time pain.

bedeho commented 4 years ago

`Option<T>` over `T` when `Zero`/`0` is given special semantics

In some cases, the value 0 can be overloaded to mean the absence of something or some event. For example, if something is supposed to happen in zero blocks, or zero amount of tokens are staked. In such cases, opt for modeling the type as Option<T> for clarity.

bedeho commented 4 years ago

Do not overload return types II

A perhaps even cleaner way of doing this is to make ensure_something into a macro, rather than a function, this way, one can avoid having to introduce lots of private types as in the first suggestion, and instead directly say

ensure_something!(..., ERROR_VALUE1, ...);

Example


macro_rules! ensure_application_exists {

    ($runtime_trait:tt, $application_id:expr, $err_value:expr) => {{
        if !<ApplicationById<$runtime_trait>>::exists($application_id) {
            Err($err_value)
        } else {
            Ok(<ApplicationById<$runtime_trait>>::get($application_id))
        }
    }}
}

// Invocation
let my_application = ensure_application_exists!(T, application_id, DeactivateApplicationError::ApplicationDoesNotExist)?;

bedeho commented 4 years ago

Explicit types for variants with parameters

There is the temptation to just have an anonymous inlined parameter list, in particular if you just have a single parameter. So doing

enum Foo {
...
MyCase(T)
...
}

or often also the inlined anonymous struct version

enum Foo {
...
MyCase {
  field_1 : T_1,
  field_2: T_2, 
  ...
}
...
}

This, however, makes it harder to have clean signatures for functions that compute new values for these cases. This pattern is something we explictly want to encourage, as it allows us to write code with fewer side effects on mutable variables (i.e. not storage).

Another big downside is that one cannot use the very convenient struct field modification syntax when building new structs, which is also critical for conveniently writing side effect free code.

MyStructType {
 field_name: NEW_VALUE,
..
old_instance
}

Example

So for example, you may want to have a function new_my_case(...) which gives you a new set of values for the MyCase variant value. Now you have to either do

new_my_case(...) -> (T_1, T_2, ...)

or the truly horrible

new_my_case(...) -> Foo

where all safety is lost. Instead, we do


struct MyCaseType {
  field_1 : T_1,
  field_2: T_2, 
  ...
}

enum Foo {
...
MyCase(MyCaseType)
...
}

fn new_my_case(...) -> MyCaseType;

bedeho commented 4 years ago

Asserting invariants in production runtime code

Something clever here... tough issue.

bedeho commented 4 years ago

Make new `ensure` abstraction if you need to return a value

If you just want to test a condition, then its fine to just inline condition using native ensure!, e.g

ensure!(x > y, ...)

, so long as the condition is not too complex, and you do not end up repeating yourself too badly.

Otherwise, or if you also need to return a value, then you must ensure a new ensure abstraction, e.g.

let outcome = ensure_something!(...)?;

mnaamani commented 4 years ago

Always update storage state before making external calls

To avoid potential re-entrance bugs (and potentially catastrophic invariant check failed assertions) always update state of storage before making calls that might potentially call back into the module. This would be calls to other modules directly or on "EventHandler" types configured on the module.

bedeho commented 4 years ago

WIP: Side effect-free helper/utility routines

When doing updating the storage in the mutation safe portion of the code path for a method, it's very common to factor out reusable methods which do some part of the mutation which is reused across methods....

NB: I just saw this pattern in my code, the proposal is quite invasive, so approach with caution.

NB2: I just now saw that the lack of clarity about what code paths actually do updating led to a bug in the hiring module where there were multiple places trying to do this. This sounds easy to avoid, but as the role of a helper changes over time, it is very easy to forget the implicit contract it has with different callers from before, hence the problem appears genuine.

NB3: needed to build virtualization of effects for very general operations in a way where there is better code reuse, e.g. for transaction concept in the version store.

Problem Case

Let's say you had a type


struct Person {
  id: u64,
  ..
  child_person_count: u32
}

and extrinsics


add_person_as_child(

Principle

So, you are implementing an extrinsic, and you make a method Foowhich tries to update the state in the correct way in some scenario that you imagine may be triggered from other extrinsic in the future. So you make it into a routine which does everything in one go.

Then later, you realize that in some other extrinsic, you need to for example call this method N times, and the ideal approach woulda actually be to accumulate some of the changes that each call to Foo would individually make, and do one single update to the relevant state. A typical example could be a counter that is incremented or something.

The alternative is just to rewrite Foo, such that it has no side effects, and returns all the information you would need to very easily make the update to the storage if you wanted to. Now, you can fall Foo many times, accumulate the returned information, and

Solution

...

bedeho commented 4 years ago

Avoid matching Christmas trees

When we have state composed over multiple levels, we may easily end up with something like this

match .. {
  => {
    match .. {
      => {
        match .. {
          => {
          ...
         _ => {..}
        }

     _ => {..}
    }
 _ => {..}
}

This is hard to read for multiple reasons, one being that you have to make long jumps to see how all cases in each match are being processed. It also prevents making each matching step into its own reusable computation.

An alternative that remains without side-effects and is much easier to read, as well as make reusable, would be


let x_1 = foo_1(...);

let x_2 = foo_2(..., x_1, ...);

...

let x_N = foo_N(....);

where foo_i typically will be some sort of ensure statement, if you are in a public method or extrinsic, or it can applications of Iterator::filter_map, if you are processing a collection through a sequence of steps, etc. All of them need not actually be methods, they can be inline computations, the main point still stands as long as they do not take the form of the matching christmas tree above.

bedeho commented 4 years ago

`mutate` over `insert` for updates II

There was initially this idea to use mutate when you meant mutate

However, this invites writing code with side effects inside the mutator. The alternative is to reconstruct the full object you are updating, and then write it back with an insert statement.

Less sure what the rule should actually be here.

bedeho commented 4 years ago

Result::Err is for genuine bad inputs, not negative answers

Suppose there was a can_do_foo(id: T::PersonId) -> Result<T, E> to check whether a person, identified with id ,could do something. The fact that they could not do it should not be signalled through E, but rather T. E is for genuine errors due to bad inputs, such as id not being a valid identifier for any person.

bedeho commented 4 years ago

Assertion must never have side-effects

At times, one may be tempted to do this


assert!(... some code with side effects, e.g. a function all ... );

But this is not great, because the reader typically will tune out contents of assertions, assuming they do nothing that needs to be tracked. Instead, do

let val = ... some code with side effects, e.g. a function all ...
assert!(... proposition with val ... );

Joystream / joystream

Substrate coding conventions I #404

Add your suggestion as a comment!

Background

Major questions that

Initial suggestions

Example

Rationale

`mutate` over `insert` for updates

Don't use `Default` trait if you are forced to impl

Do not overload return types

`Option<T>` over `T` when `Zero`/`0` is given special semantics

Do not overload return types II

Example

Explicit types for variants with parameters

Example

Asserting invariants in production runtime code

Make new `ensure` abstraction if you need to return a value

Always update storage state before making external calls

WIP: Side effect-free helper/utility routines

Problem Case

Principle

Solution

Avoid matching Christmas trees

`mutate` over `insert` for updates II

Result::Err is for genuine bad inputs, not negative answers

Assertion must never have side-effects

Joystream / joystream

Substrate coding conventions I #404

Add your suggestion as a comment!

Background

Major questions that

Initial suggestions

Example

Rationale

mutate over insert for updates

Don't use Default trait if you are forced to impl

Do not overload return types

Option<T> over T when Zero/0 is given special semantics

Do not overload return types II

Example

Explicit types for variants with parameters

Example

Asserting invariants in production runtime code

Make new ensure abstraction if you need to return a value

Always update storage state before making external calls

WIP: Side effect-free helper/utility routines

Problem Case

Principle

Solution

Avoid matching Christmas trees

mutate over insert for updates II

Result::Err is for genuine bad inputs, not negative answers

Assertion must never have side-effects

`mutate` over `insert` for updates

Don't use `Default` trait if you are forced to impl

`Option<T>` over `T` when `Zero`/`0` is given special semantics

Make new `ensure` abstraction if you need to return a value

`mutate` over `insert` for updates II