feat: eject - Githubissues

ApeWorX / ape

The smart contract development tool for Pythonistas, Data Scientists, and Security Professionals

https://apeworx.io

Apache License 2.0

850 stars 128 forks source link

feat: eject #1912

Open mikeshultz opened 5 months ago

mikeshultz commented 5 months ago

Overview

Loosely inspired by create-react-app's eject feature.

Basically, running a command like ape eject would essentially export your project into an ape-less structure. I think at a minimum it would only include your contracts and all of their imported dependencies. It's essentially a mechanism to "step out" of the ape framework.

This might be useful for a handful of specific reasons:

To move to another framework (as if)
A custom build process, like CI/CD or maybe they want strict control of the compilers used
Creating a file structure that is auditable in that no surprise dependency injection can occur
Publishing contract source only as RFC or just to keep things clean
Use in repeatable builds

The final bullet point is what brought me here (see #1335). The contracts/.cache folder makes things tricky, and moving it has unexpected behavior, like altering the compiler's output bytecode due to the metadata hash changing. This would be less of a concern if we just recreated the contract structure including the needed dependencies in a temporary directory as a step in the build process. In this way, the directory structure the compiler sees never changes.

The underlying mechanism of ape eject would be a (behind the scenes) required step before the ape compile build process.

Specification

Command

ape eject BUILD_DIR

Where BUILD_DIR is the optional path to a directory to eject the project to.

Mechanism

Consider a project that looks something like this:

ape-config.yaml
contracts/
contracts/ape.sol
scripts/
setup.py

ape-config.yaml has configuration for OpenZeppelin dependencies and ape.sol is a simple ERC20 token contract. Running ape eject /tmp/build would create something that looks like this:

ape.sol
@openzeppelin/contracts/token/ERC20/ERC20.sol
@openzeppelin/contracts/token/ERC20/IERC20.sol
@openzeppelin/contracts/token/ERC20/extensions/IERC20Metadata.sol
@openzeppelin/contracts/utils/Context.sol

In a normal build process a CompilerAPI plugin would be handed /tmp/build as the base_path and ape.sol as contract path. I don't expect much changes to compiler plugins.

Caveats

This may or may not make import remapping unnecessary. Or it may require patching to contracts, which probably wouldn't be ideal. This may require updates to compiler plugins as well, if remapping behavior needs to change. Though it might simplify compiler plugins for the better.

Would need to investigate a bit more and maybe test the eject output out with raw compilers and see how they react.

linear[bot] commented 5 months ago

APE-1667 feat: eject

antazoey commented 5 months ago

The EthPM Types PackageManifest is supposed to be like an "ejected" project in that we pushing for adoption across tools and the manifest should contain all the compiler info and dependency info necessary to compile a project; no structure is needed. The sources are included in the manifest. You can compile a manifest by using the source data to compile and populate parts the contract types.

Right now I am working on a refactor that is going to make this part even easier, so compiling a manifest (turning source data into contract type data and populating different fields in the manifest). Doing may indeed render the .cache folder unnecessary!

mikeshultz commented 5 months ago

The EthPM Types PackageManifest is supposed to be like an "ejected" project in that we pushing for adoption across tools and the manifest should contain all the compiler info and dependency info necessary to compile a project; no structure is needed. The sources are included in the manifest. You can compile a manifest by using the source data to compile and populate parts the contract types.

The manifest is a JSON file yeah? Do you expect the compilers to directly ingest the manifest or would it still need to be laid out like with the standard input JSON files?

Right now I am working on a refactor that is going to make this part even easier, so compiling a manifest (turning source data into contract type data and populating different fields in the manifest). Doing may indeed render the .cache folder unnecessary!

Look forward to seeing this :+1:

fubuloubu commented 5 months ago

I sometimes get a little confused as well, I don't think this is really talking about the .build/ folder (which contains the manifest), but one thing ethPM can be configured to do is that the dependencies can also serve as the way to specify solc remappings, and we can leverage that to give devs more control over where the "unbundled" sources from a dependency land when the sources are removed from the manifest (which is what the .cache folder does)

Right now, what's happening is that a "package" is built for a dependency by downloading some source of a project (github, npm, etc), and it's bundled into a manifest and built all together, then loaded under ~/.ape/packages/* somewhere according to the user-specific name that the dev has given it, which is both a little inefficient and performing unnecessary compilation work as well as bundling sources together that utimately get unbundled into .cache/*

What we could do instead is basically better utilize the definition of a dependency to deduplicate a lot of this work:

Create dependency directory based on dependency type and their "real identifier" e,g, ~/.ape/packages/github/OpenZeppelin-contracts/v4.7.0/* instead of ~/.ape/packages/openzeppelin/v4.7.0/*
Download and install on disk the sources in an "unbundled" structure into ~/.ape/packages/<deptype>/<depname>/<depversion>/* (corresponding to the that directory being the base contracts/ folder)
Create the manifest (similar to how we do it now for a local project) as a set of checksums and local file links to those files, as well as source mappings to their dependencies' "real identifers" (as relative file paths, which is not consistent with EIP-2678 but we should change that in the next revision)
Don't compile the dependencies, unless you ask for a contract type from them (and the checksums tell you to compile again, or --force)
When calling a compiler, assemble the source mapping to this folder as stored under the user's global ~/.ape/packages directory so that the .cache folder is no longer even neccessary
Add a way to use a git-submodule dependency type, so that foundry dependencies can be configured to work with config overrides to install the submodules and reference them as local dependencies
Allow specifying the root folder of a dependency so things like @openzeppelin/contracts can be specified where contracts in that dependency follow ./contracts/* structure instead of ./* structure for their layout

Using this, we can remove the need for the .cache folder to "unbundle" sources from a dependency, create more deterministic builds that eliminate storing extra copies of dependency files under ~/.ape/packages just because they have different names, and also reduce/eliminate the need to always set up the solidity plugin's remappings to match how the dependencies get specified since they can automatically get driven by dependency names

mikeshultz commented 5 months ago

When calling a compiler, assemble the source mapping to this folder as stored under the user's global ~/.ape/packages directory so that the .cache folder is no longer even neccessary

So you're saying copy user sources into the global package dir? And the package dir is the compile base_path?

Couple things that jump to mind:

Having your project source copied to a user-level file structure may be unexpected. If we did this, I think at most it should be temporary during compilation. Maybe not a big deal but it makes me a little uneasy for some reason. Though I guess my IDE also copies my working sources to a global user dir between saves as well...
It doesn't create as clean of a source compilation structure as a build stage might. Being able to build a file structure that includes only the source files necessary for compilation has some other benefits (listed in the OP). Should also be noted that the current .cache folder does not do this and includes the full dependency package source files.

Could work though.

fubuloubu commented 5 months ago

Having your project source copied to a user-level file structure may be unexpected. If we did this, I think at most it should be temporary during compilation. Maybe not a big deal but it makes me a little uneasy for some reason. Though I guess my IDE also copies my working sources to a global user dir between saves as well...

Just dependency sources, not user projects. Calling the compiler for the user project would reference the paths of these dependencies via the source remapping feature of solc/vyper using the dependency's name (which is currently happening twice)

It doesn't create as clean of a source compilation structure as a build stage might. Being able to build a file structure that includes only the source files necessary for compilation has some other benefits (listed in the OP). Should also be noted that the current .cache folder does not do this and includes the full dependency package source files.

Yes, it should "clean" itself only down to the file set of source files, and not the whole directory. Essentially, glob it by all the registered compiler extensions, which may actually resolve another issue with it not recompiling when you first forget to install a compiler plugin a dependency might need (say .sol for a vyper-only project). We can also detect in reverse fashion if a previously-downloaded and cleaned dependency sources directory contains extensions for compilers that are not installed in the plugin set.

mikeshultz commented 5 months ago

Having your project source copied to a user-level file structure may be unexpected. If we did this, I think at most it should be temporary during compilation. Maybe not a big deal but it makes me a little uneasy for some reason. Though I guess my IDE also copies my working sources to a global user dir between saves as well...

Just dependency sources, not user projects. Calling the compiler for the user project would reference the paths of these dependencies via the source remapping feature of solc/vyper using the dependency's name (which is currently happening twice)

Yeah, using absolute paths in remappings would not be portable. As the input JSON would have to have the full FS path in the file if it's not relative to the base path. So compiling on one machine is likely going to be different than another. That's where this idea came from. That all relevant sources are assembled into one reproducible file structure with a common base path.

It doesn't create as clean of a source compilation structure as a build stage might. Being able to build a file structure that includes only the source files necessary for compilation has some other benefits (listed in the OP). Should also be noted that the current .cache folder does not do this and includes the full dependency package source files.

Yes, it should "clean" itself only down to the file set of source files, and not the whole directory. Essentially, glob it by all the registered compiler extensions, which may actually resolve another issue with it not recompiling when you first forget to install a compiler plugin a dependency might need (say .sol for a vyper-only project). We can also detect in reverse fashion if a previously-downloaded and cleaned dependency sources directory contains extensions for compilers that are not installed in the plugin set.

By clean I mean only the necessary files would be in the resulting file structure, not just filtering by lang. So, .cache/OpenZeppelin/v4.5.0/token/ERC721/ would not be included in the build stage if I'm not building anything with an ERC-721 interface. So if I need the OZ ERC-20 interface, it will only include the ~4 files necessary to compile the contract and not the entire OZ package.

fubuloubu commented 5 months ago

Having your project source copied to a user-level file structure may be unexpected. If we did this, I think at most it should be temporary during compilation. Maybe not a big deal but it makes me a little uneasy for some reason. Though I guess my IDE also copies my working sources to a global user dir between saves as well...

Just dependency sources, not user projects. Calling the compiler for the user project would reference the paths of these dependencies via the source remapping feature of solc/vyper using the dependency's name (which is currently happening twice)

Yeah, using absolute paths in remappings would not be portable. As the input JSON would have to have the full FS path in the file if it's not relative to the base path. So compiling on one machine is likely going to be different than another. That's where this idea came from. That all relevant sources are assembled into one reproducible file structure with a common base path.

Could it be not under contracts/ though? Under .build/ instead, as a symlink?

It doesn't create as clean of a source compilation structure as a build stage might. Being able to build a file structure that includes only the source files necessary for compilation has some other benefits (listed in the OP). Should also be noted that the current .cache folder does not do this and includes the full dependency package source files.

Yes, it should "clean" itself only down to the file set of source files, and not the whole directory. Essentially, glob it by all the registered compiler extensions, which may actually resolve another issue with it not recompiling when you first forget to install a compiler plugin a dependency might need (say .sol for a vyper-only project). We can also detect in reverse fashion if a previously-downloaded and cleaned dependency sources directory contains extensions for compilers that are not installed in the plugin set.

By clean I mean only the necessary files would be in the resulting file structure, not just filtering by lang. So, .cache/OpenZeppelin/v4.5.0/token/ERC721/ would not be included in the build stage if I'm not building anything with an ERC-721 interface. So if I need the OZ ERC-20 interface, it will only include the ~4 files necessary to compile the contract and not the entire OZ package.

Oh, I think that's entirely too low level, but in theory we are pre-fetching the local project's imports and can do some level of analysis to reduce the fileset saved on disk, but this would likely be very complicated

mikeshultz commented 5 months ago

Having your project source copied to a user-level file structure may be unexpected. If we did this, I think at most it should be temporary during compilation. Maybe not a big deal but it makes me a little uneasy for some reason. Though I guess my IDE also copies my working sources to a global user dir between saves as well...

Just dependency sources, not user projects. Calling the compiler for the user project would reference the paths of these dependencies via the source remapping feature of solc/vyper using the dependency's name (which is currently happening twice)

Yeah, using absolute paths in remappings would not be portable. As the input JSON would have to have the full FS path in the file if it's not relative to the base path. So compiling on one machine is likely going to be different than another. That's where this idea came from. That all relevant sources are assembled into one reproducible file structure with a common base path.

Could it be not under contracts/ though? Under .build/ instead, as a symlink?

.build/ is manifest stuff? A symlink would have all the same issues (and maybe more) as just copying the files to the contracts dir (like .cache/) anyway.

It doesn't create as clean of a source compilation structure as a build stage might. Being able to build a file structure that includes only the source files necessary for compilation has some other benefits (listed in the OP). Should also be noted that the current .cache folder does not do this and includes the full dependency package source files.

Yes, it should "clean" itself only down to the file set of source files, and not the whole directory. Essentially, glob it by all the registered compiler extensions, which may actually resolve another issue with it not recompiling when you first forget to install a compiler plugin a dependency might need (say .sol for a vyper-only project). We can also detect in reverse fashion if a previously-downloaded and cleaned dependency sources directory contains extensions for compilers that are not installed in the plugin set.

By clean I mean only the necessary files would be in the resulting file structure, not just filtering by lang. So, .cache/OpenZeppelin/v4.5.0/token/ERC721/ would not be included in the build stage if I'm not building anything with an ERC-721 interface. So if I need the OZ ERC-20 interface, it will only include the ~4 files necessary to compile the contract and not the entire OZ package.

Oh, I think that's entirely too low level, but in theory we are pre-fetching the local project's imports and can do some level of analysis to reduce the fileset saved on disk, but this would likely be very complicated

We're already doing import analysis for remappings.