refactor: improve types package to use forks as generics

nazarhussain commented 1 month ago

Motivation

Simplify the usage of Fork specific types around the repo.

Description

Convert namespace oriented types to generics e.g. instead allForks.phase0.BeaconBlock to BeaconBlock<ForkName.phase0>
This will allow to extract types for multiple forks e.g. BeaconBlock<ForkName.phase0 | ForkName.altair>
We use union collection to group the forks e.g. ForkExecution, ForkLightclient

Closes #4656

Steps to test or reproduce

Run all tests

github-actions[bot] commented 1 month ago

Performance Report

✔️ no performance regression detected

Full benchmark results

by benchmarkbot/action

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 81.70895% with 137 lines in your changes missing coverage. Please review.

Project coverage is 62.56%. Comparing base (a2c389f) to head (536d8c9).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## unstable #6825 +/- ## ============================================ - Coverage 62.67% 62.56% -0.11% ============================================ Files 578 575 -3 Lines 61220 61066 -154 Branches 2113 2116 +3 ============================================ - Hits 38367 38207 -160 - Misses 22815 22820 +5 - Partials 38 39 +1 ```

nazarhussain commented 3 weeks ago

Converted to draft to resolve huge conflicts after #6749 merged.

nazarhussain commented 2 weeks ago

The Motviation of this PR is not obvious to me

It was much clearly explained in the presentation.

Looking at the diff, I don't see a single line where it simplified things or removed unsafe type casts (it rather added more). The semantics are mostly the same, we still use "all forks" for types that do not exist on all forks. I don't think that's a big issue though, I always though of "all forks" as "all supported forks" anyways.

We had the usage of allForks contextually wrong on most of the places. e.g. If we refer allForks which are defined under ForkName enum, then allForks.LightClientUpdate is a wrong type as phase0 does not have this type. What we intended here is the LightClientUpdate only for forks which had light-client support. So instead of referring allForks.LightClientUpdate it's much clear to specify it as LightClientUpdate<ForkLightClient>.

we still use "all forks" for types that do not exist on all forks.

It is not possible because there is no name-space allForks anymore. Either you can use directly from fork specific type phase0, altair or you use utility types exposed directly form root of package. If you are referring to the SSZ Types, that will be cleanup up as well in upcoming PR as explained in presentation.

I don't see a single line where it simplified things or removed unsafe type casts

If you observe where we are passing fork collection to generics are all unsafe types referred via allForks. We didn't encountered type error before because of the structural typing of TS and boilerplate code in the root export where we filter every type explicitly for allForks namespace.

The extra type machinery required looks scary to maintain, hoping this just works and does not require type debugging in the future.

I don't see it scary, it's simple generic types we used everywhere. During this PR I faced some issues which were mostly assigning BeaconBlock<ForkAll> to where BeaconBlock<ForkExecution> were expected. So I feel this helps me identify the problems much better.

Would be great to get more examples on type simplification we plan with this in the future to get a better picture of the whole solution, as I understood from the presentation there are 3 stages.

Once you start using these generics types, you will find these more flexible and useful. Anyhow have plans to add docs guiding steps to add more generic types, or adding new forks.

Is this in any way also related to https://github.com/ChainSafe/lodestar/issues/6799?

Yes this refactoring will facilitate to have more typesafe code for upcoming forks.

nflaig commented 2 weeks ago

It was much clearly explained in the presentation.

We should document the main points in the motivation of the PR. I tried to rewatch the recording but it seems to be no longer accessible. Or at least share the slides via link, and reference in PR.

It is not possible because there is no name-space allForks anymore

this is what I don't quite get, we are still doing this, but instead of allForks namespace we now pass ForkAll but isn't this the same?

e.g. previous

allForks.SignedBlindedBeaconBlock;

and now

SignedBeaconBlock<ForkAll, "blinded">;

what's the difference between those two? we still claim that blinded beacon block exists on all forks but that's not the case

wemeetagain commented 2 weeks ago

I think the main benefit we get is more expressiveness wrt features that exist in subsets of all forks. If we're blindly transliterating from what we have, that means a lot of Type<ForkAll>. But I think this will be helpful as the # of forks keeps proliferating.

SignedBeaconBlock<ForkAll, "blinded">;

Maybe worth considering just having a single generic param for all generic types here.

so, BeaconBlock, BlindedBeaconBlock, and FullOrBlindedBeaconBlock can be all separate types.

This may be easier devex, since we won't have to remember which generic types have 2 or more params, and which possibilities exist. Rather, these generic types will have a single generic param for the fork.

nazarhussain commented 2 weeks ago

what's the difference between those two? we still claim that blinded beacon block exists on all forks but that's not the case

The difference of allForks.SignedBlindedBeaconBlock vs SignedBeaconBlock<ForkAll, "blinded"> is very clear by itself and explained with the PR title, use forks as generics.

Now if question you are referring why generics, that I explained in my earlier comment as well. The usage of allForks is contextually wrong in many places as some types does not belongs to all forks. Also when comes to generics, it's obvious we can pass those as parameters.

As we add more forks it will be unmanageable to refer those via allForks, rather we intended to refer types with respective to different categories of forks by feature e.g. ForkExecution or ForkBlobs. Having this flexibility is not possible via allForks namespace usage.

nflaig commented 2 weeks ago

The difference of allForks.SignedBlindedBeaconBlock vs SignedBeaconBlock<ForkAll, "blinded"> is very clear by itself and explained with the PR title, use forks as generics. Now if question you are referring why generics,

I am not asking about why generics vs. namespaces, besides the additional type complexity making them work, the overall ergonomics of the usage looks good to me.

The usage of allForks is contextually wrong in many places as some types does not belongs to all forks

I agree with this but I am confused why we still do

SignedBeaconBlock<ForkAll, "blinded">; // <-- this shouldn't even be allowed, fork must be subset of ForkExecution

instead of

SignedBeaconBlock<ForkExecution, "blinded">;

The code contradicts with what you are saying

nflaig commented 2 weeks ago

This may be easier devex, since we won't have to remember which generic types have 2 or more params

What was the reason for combining blinded and full block? Do we have some examples where this improves type handling?

so, BeaconBlock, BlindedBeaconBlock, and FullOrBlindedBeaconBlock can be all separate types.

I like it more, it's more expressive and would be more inline with current types

wemeetagain commented 2 weeks ago

What was the reason for combining blinded and full block? Do we have some examples where this improves type handling?

I guess the reason is that these variants blow up the # of types and that generally the variants can be used interchangeably (and if not, the type will be narrowed to show that). If the default is 'all variants', then we will either automatically support all variants, or the type will be explicitly narrowed to suit. For example, if SignedBeaconBlock is both FullOrBlinded by default, then more functions would automatically support both full and blinded signed blocks. The use of generics just made it more concise and highlighted the point that "there is just one type, but with several variants".

so, BeaconBlock, BlindedBeaconBlock, and FullOrBlindedBeaconBlock can be all separate types.

I like it more, it's more expressive and would be more inline with current types

Yea I don't really have a strong opinion about this point in particular. I think just using the generics for specifying the fork will already be a big win long term.

nflaig commented 2 weeks ago

I guess the reason is that these variants blow up the # of types and that generally the variants can be used interchangeably (and if not, the type will be narrowed to show that). If the default is 'all variants', then we will either automatically support all variants, or the type will be explicitly narrowed to suit. For example, if SignedBeaconBlock is both FullOrBlinded by default, then more functions would automatically support both full and blinded signed blocks. The use of generics just made it more concise and highlighted the point that "there is just one type, but with several variants".

They can be used interchangeably in a lot of places but not all, e.g. consider produceBlockV3, it either returns just a blinded block, or a full block as part of block contents. Especially since deneb, a full block has additional implications in some places due to attached sidecars like blobs / kzg proofs while a blinded block won't have that.

Either way, we could focus this PR on just the fork type as generics part and apply other changes later.

I think just using the generics for specifying the fork will already be a big win long term.

yeah, I like the generics, the fact that the type guards in packages/types/src/utils/typeguards.ts can also keep context of the fork is a good indicator that the design makes sense.

nazarhussain commented 2 weeks ago

But can we please not approach the type refactor with the mindset that type casts don't reduce type safety unless you use as unknown..#6825 (comment)

That's not a mindset that's how TS works. See the code example below.

function mul(val: number | string, factor: number): number | string {
  if(Number(val) === val) {
    return val * factor;
  }

  return repeat(val as string, factor);
}

function repeat(val: string, factor: number): string {
   // some code
}

We have to use as string here because typescript could not infer the correct type here. Such situations could be avoided by having type-guards everywhere, but that's not what is a best practice either. And type-guards does not work well on the nested properties, so in such situations you have to infer the correct type yourself.

There are some coding patterns to avoid such situations but that's out of scope of this PR.

nazarhussain commented 2 weeks ago

I agree with this but I am confused why we still do
SignedBeaconBlock<ForkAll, "blinded">; // <-- this shouldn't even be allowed, fork must be subset of ForkExecution
instead of
SignedBeaconBlock<ForkExecution, "blinded">;
The code contradicts with what you are saying

Seems you commented without actually experience the change. Now branch status is changed a lot but even earlier above SignedBeaconBlock<ForkAll, "blinded"> was not allowed will give you type error.

nflaig commented 2 weeks ago

Seems you commented without actually experience the change.

yes, looks good now. This comment was done on an older state of the branch.

That's not a mindset that's how TS works. See the code example below.

How is this example relevant? Of course there are situations where type-casts are fine but claiming those do not reduce type safety unless you use as unknown is just a false claim. Please see my earlier example https://github.com/ChainSafe/lodestar/pull/6825#discussion_r1638468234.

nflaig commented 2 weeks ago

Curious if you have explored type inference for the cases were we use the object itself to determine the fork.

e.g. in packages/state-transition/src/signatureSets/index.ts

 // fork based validations
  const fork = state.config.getForkSeq(signedBlock.message.slot);

  // Only after altair fork, validate tSyncCommitteeSignature
  if (fork >= ForkSeq.altair) {
    const syncCommitteeSignatureSet = getSyncCommitteeSignatureSet(
      state as CachedBeaconStateAltair,
      (signedBlock as altair.SignedBeaconBlock).message // <-- would be nice if it could infer altair here as we check fork above
    );
    // There may be no participants in this syncCommitteeSignature, so it must not be validated
    if (syncCommitteeSignatureSet) {
      signatureSets.push(syncCommitteeSignatureSet);
    }
  }

  // only after capella fork
  if (fork >= ForkSeq.capella) {
    const blsToExecutionChangeSignatureSets = getBlsToExecutionChangeSignatureSets(
      state.config,
      signedBlock as capella.SignedBeaconBlock // <-- same here
    );
    if (blsToExecutionChangeSignatureSets.length > 0) {
      signatureSets.push(...blsToExecutionChangeSignatureSets);
    }
  }

We do a lot of those type casts in different places, while the type casts are safe here would be nice to have. This was definitely impossible before (without generics) could work now...assuming typescript is smart enough but we would likely need a wrapped object like this {fork: F, signedBlock: SignedBeaconBlock<F>}

ChainSafe / lodestar

refactor: improve types package to use forks as generics #6825

Performance Report

Codecov Report