ethereumjs / organization

A repo for discussions and other non-code organizing stuff
https://ethereumjs.readthedocs.io
12 stars 17 forks source link

Design problems and a proposal to fix them #56

Closed alcuadrado closed 5 days ago

alcuadrado commented 5 years ago

I'm opening this issue to summarise what IMO are the problems with the current design of some libraries and to propose an alternative.

Problems

From my perspective, there are three interconnected problems in the libraries that represent protocol objects (i.e. Account, Block and Transaction):

  1. They are based on ehereumjs-util's defineProperties. This function is based on super dynamic aspects of javascript, making the libs hard to adapt to typscript, hard to understand, and hard to maintain. This has been discussed during multiple typescript migrations, most notably during ethereumjs-account's one.

  2. They are designed around the RLP representation of each object. While this isn't necessarily a bad thing, these libraries are also used by smart contracts and dapps developers, that don't have enough understanding of the protocol and have trouble understanding some things. A clear example of this is that ethereumjs-tx periodically receives false bug reports about leading zeros being stripped. This is the expected behavior when working with the RLP representation of some of the transactions fields, but totally unexpected for someone using it for another purpose.

  3. All of them are mutable. This is an issue, as there are complex requirements/dynamics between them, and being able to mutate each object individually opens the door to getting objects out of sync. An example of this is parsing a block, which will construct a Block object and a series of Transaction objects, and then changing the block number to 0. This would/should render any EIP155-signed transaction invalid, but it's not clear how this can be implemented with the current design. What will probably happen now is that the Block and the Transactions will have incompatible Common instances.

Proposal

I think (1) can be solved without breaking changes by inlining part of defineProperties logic in each library. I implemented a prototype of a similar idea on ethereumjs-tx in this PR. While I consider this an improvement, it will mostly increase the maintainability of the libraries. This change wouldn't make any difference with (2) and (3), which are the points that can lead to confusion and bugs in the users' code.

In my opinion, to effectively fix (3) without an intricate design that may not be enough in the future, we should prohibit mutating the domain objects. This means making Account, Transaction and Block immutable. And the same should be done for Common. If they are immutable, everything can be validated on construction, and objects can't get out of sync.

Fixing (2) needs a deeper redesign of the libraries. RLP should be treated just as a serialization strategy, and everything should be modeled with their actual data types instead of everything being a Buffer.

I think a good approach to design them would be to aknowledge that there are multiple representations of these objects, implement everything as close to the actual domain as possible, and make it easy to go from one representation to the others. At the moment, I think these should be:

Example implementation

I sketched how Transaction would look if reimplemented with this design

// I just use this class to indicate that the Common object should be immutable
class ImmutableCommon {}

// These functions and types should go to a shared library, -util?

function bigIntToRlp(bn: bigint): Buffer {
  return Buffer.from([]);
}

class Address {
  public serialize(): Buffer {
    return Buffer.from([]);
  }
}

export interface TransformableToBuffer {
  toBuffer(): Buffer;
}

export type BufferLike = Buffer | TransformableToBuffer | string | number;
export type AddressLike = string | Address;
export type BigIntLike = string | bigint | number;

export function addresLikeToAddress(addr: AddressLike): Address {
  return new Address();
}

export function bufferLikeToBuffer(bl: BufferLike): Buffer {
  return Buffer.from([]);
}

export function bigIntLikeToBigInt(bil: BigIntLike): bigint {
  return 1n;
}

export function bigIntToHex(bi: bigint): string {
  return "0x" + bi.toString(16);
}

// The -tx proposal starts here.

export interface TxData {
  gasLimit?: BigIntLike;
  gasPrice?: BigIntLike;
  to?: AddressLike;
  nonce?: BigIntLike;
  data?: BufferLike;
  v?: BigIntLike;
  r?: BigIntLike;
  s?: BigIntLike;
  value?: BigIntLike;
}

export interface JsonTx {
  gasLimit?: string;
  gasPrice?: string;
  to?: string;
  nonce?: string;
  data?: string;
  v?: string;
  r?: string;
  s?: string;
  value?: string;
}

class Transaction {
  // **This is the "entry point" for people working with objects at the protocol
  // level.** It takes a serialized tx as seen in the protocol, and returns an
  // immutable domain object.
  public static deserialize(
    rlpSerializedTx: Buffer,
    common: ImmutableCommon
  ): Transaction {
    // parse the rlp buffer
    return new Transaction(/* ... */);
  }

  // **This is the "entry point" for people working with the RPC or other higher
  // level applications.** While the domain object is already pretty high level,
  // having a way to go from bare javascript objects into a domain object is
  // super useful.
  //
  // This function would probably be the most commonly used.
  //
  // This function handles default values.
  public static fromTxData(txData: TxData, common: ImmutableCommon) {
    return new Transaction(
      common,
      txData.nonce !== undefined ? bigIntLikeToBigInt(txData.nonce) : 0n,
      txData.gasLimit !== undefined
        ? bigIntLikeToBigInt(txData.gasLimit)
        : 9000n,
      txData.gasPrice !== undefined
        ? bigIntLikeToBigInt(txData.gasPrice)
        : 8000000000n,
      txData.to !== undefined ? addresLikeToAddress(txData.to) : undefined,
      txData.value !== undefined ? bigIntLikeToBigInt(txData.value) : 0n,
      txData.data !== undefined
        ? bufferLikeToBuffer(txData.data)
        : Buffer.from([]),
      txData.v !== undefined ? bigIntLikeToBigInt(txData.v) : undefined,
      txData.r !== undefined ? bigIntLikeToBigInt(txData.r) : undefined,
      txData.s !== undefined ? bigIntLikeToBigInt(txData.s) : undefined
    );
  }

  // Everything is readonly.
  public readonly common: ImmutableCommon;
  public readonly nonce: bigint;
  public readonly gasLimit: bigint;
  public readonly gasPrice: bigint;
  public readonly to?: Address;
  public readonly value: bigint;
  public readonly data: Buffer;
  public readonly v?: bigint;
  public readonly r?: bigint;
  public readonly s?: bigint;

  // This constructor just takes the values, validates them, assigns them and
  // freezes the object.
  constructor(
    common: ImmutableCommon,
    nonce: bigint,
    gasLimit: bigint,
    gasPrice: bigint,
    to: Address | undefined,
    value: bigint,
    data: Buffer,
    v?: bigint,
    r?: bigint,
    s?: bigint
  ) {
    // TODO: Validate everything

    this.common = common;
    this.nonce = nonce;
    this.gasLimit = gasLimit;
    this.gasPrice = gasPrice;
    this.to = to;
    this.value = value;
    this.data = data;
    this.v = v;
    this.r = r;
    this.s = s;

    Object.freeze(this);
  }

  // This function is the way to go from a domain object to the protocol
  // representation of a transaction.
  //
  // Things that were previously handled by `defineProperties` are handled here.
  // For example, stripping the leading zeros of numeric values.
  serialize(): Buffer {
    return rlpencode([
      bigIntToRlp(this.nonce),
      bigIntToRlp(this.gasLimit),
      bigIntToRlp(this.gasPrice),
      this.to !== undefined ? this.to.serialize() : Buffer.from([]),
      bigIntToRlp(this.value),
      this.data,
      this.v !== undefined ? bigIntToRlp(this.v) : Buffer.from([]),
      this.r !== undefined ? bigIntToRlp(this.r) : Buffer.from([]),
      this.s !== undefined ? bigIntToRlp(this.s) : Buffer.from([])
    ]);
  }

  // Methods that previously altered the object, now return a new one.
  sign(privateKey: Buffer): Transaction {
    const data = this.getDataToSign();
    const signature = sign(data, privateKey);

    return new Transaction(
      this.common,
      this.nonce,
      this.gasLimit,
      this.gasPrice,
      this.to,
      this.value,
      this.data,
      signature.v,
      signature.r,
      signature.s
    );
  }

  toJson(): JsonTx {
    return {
      nonce: bigIntToHex(this.nonce)
      // ...
    };
  }
}

// Sample usage:

const common = new ImmutableCommon("mainnet");

const tx = Transaction.fromTxData(
  {
    to: "0x123123123123213123123123123",
    value: 123123
  },
  common
);

// Modifying a tx

const txNonce123 = Transaction.fromTxData(
  {
    ...tx,
    nonce: 123
  },
  tx.common // Having to pass common here seems error-prone
);

// Signing a tx

const pk = Buffer.from([]);
const signedTx = tx.sign(pk);

Unknowns

There are two things that are hard to answer about this proposal:

  1. How do we go from the current design to this one incrementally? It'd be a major implementation and coordination effort.

  2. How would implementing everything with types more abstract than Buffer impact on the project's performance? I have a strong belief that it wouldn't be much of an impact, as we are copying buffers and transforming them from/to strings super often now.

Previous discussions

Finally, here's a list of interesting links where these things have been discussed:

Previous discussions about the limitations of the design:

https://github.com/ethereumjs/ethereumjs-block/pull/69#issuecomment-515452213 https://github.com/ethereumjs/ethereumjs-account/issues/29 https://github.com/ethereumjs/ethereumjs-tx/issues/151 https://github.com/ethereumjs/ethereumjs-tx/pull/154

Common "bugs" that users report because of the libraries using RLP-based representation.

https://github.com/ethereumjs/ethereumjs-tx/issues/74#issuecomment-318066638 https://github.com/ethereumjs/ethereumjs-tx/issues/112 https://github.com/ethereumjs/ethereumjs-tx/issues/140 https://github.com/ethereumjs/ethereumjs-tx/issues/164 https://github.com/ethereumjs/ethereumjs-util/issues/141

holgerd77 commented 5 years ago

Hi Patricio, thanks for this, I'm not completely through with reading but just to let you know that this won't get lost, generally super valuable to have such broadly-ontaking and deep reaching strategic analysis.