Closed geelen closed 2 years ago
Fully agreed, @geelen. I figure there is another reason to have the dir
configurable vs. only using a path to main
, but if not we should change this. Maybe @xortive could clarify?
dir
and main
are required because wrangler doesn't actually traverse the module graph from the entrypoint to figure out what to upload. Writing a parser for ES6 import/export syntax and re-implementing the node.js module resolution algorithm would have been a decent sized project on top of modules + custom builds support that I'd have expected to have subtle bugs until it got more mature -- if wrangler was written in JS where I could have used existing code, I'd have gone this route instead.
The dir
+ main
solution was much simpler to implement and reason about, as it directly maps to what the runtime expects: a list of modules with names, and the name of the main module. The runtime has no concept of paths, but module names tend to be derived from paths.
@geelen I agree this is a bit of a footgun, I think making dir
not optional would be great, but not sure how to introduce that without breaking folks. Until we have a way to parse the module graph, we're stuck with requiring dir
in some form, whether as an implicit default or always required.
dir and main are required because wrangler doesn't actually traverse the module graph from the entrypoint to figure out what to upload.
Until we have a way to parse the module graph, we're stuck with requiring dir in some form, whether as an implicit default or always required.
I'm not sure I follow why this would be required? I'm talking about changing how the main
path is interpreted: ./dist/worker.mjs
being relative to wrangler.toml
rather than ./worker.mjs
being relative to the implicit dir
variable. It shouldn't require parsing the module graph, at least to my understanding?
it directly maps to what the runtime expects: a list of modules with names, and the name of the main module. The runtime has no concept of paths, but module names tend to be derived from paths.
So, if I'm reading this correctly... dir = "dist"
effectively becomes a glob of include = ["dist/**/*"]
, where each "file" becomes a module, indexed with its relative path to dist
, i.e. dist/index.mjs
becomes keyed with ./index.mjs
? Without the dir
argument, wrangler doesn't know which files to upload, and without the main
argument matching exactly the relative-paths-from-dir format, the runtime doesn't know which "module" is the entry point?
As an aside, because main
isn't ever actually interpreted by Wrangler as a path (it's just a string key to an implicit glob), it's never actually checked whether it exists before the upload takes place? Hence why #1952 reports an error from the API, not from the CLI before the upload.
That all makes sense to me now, but I'd class this as a leaking the runtime's model into the CLI tooling, rather than working from the user's perspective. If my understanding is correct (we need a list of files to upload and a named entry point), then I think the easiest change is the following:
[build.upload]
format = "modules"
main = "./dist/index.mjs"
include = ["./dist/**/*"]
The only optional entry is include
, which defaults to []
.
Semantics would be as follows:
main
is relative to wrangler.toml
main
and verify the file exists there, and always upload it as the entry point of the moduleinclude
follows normal glob syntax, i.e. {}
for multiple matches, !
for exclusions etcinclude
are deduped with whatever is in main
, to handle the case when they're in the same directoryinclude
are uploaded with a key of their relative path to main
. So, in the case where index.mjs
references a file file.html
in the same directory, import "./file.html"
will work as expected.I think this matches a use case where a user would start with a single file module, then look to "include" extra assets in the upload as a separate step. That separate step may not require the same build process as what generates the main
module at all, for example:
[build.upload]
format = "modules"
main = "./dist/worker.mjs"
include = ["./public/**/*"]
In this case, the worker would need to reference files using a relative path (import "../public/index.html"
) but I think that's pretty understandable for a user (at least, it makes more sense than paths being relative to wrangler.toml, imo)
I think making dir not optional would be great, but not sure how to introduce that without breaking folks
Given this only affects module syntax workers, which are in prerelease, does that not give us a bit of scope to make a breaking change? I don't know if we have any infrastructure to support release notes on a wrangler upgrade, or a codemod of their config file, that sort of thing.
I believe wrangler.toml
was going to get a date stamp for runtime versioning in future, we could introduce that here: anything without a datestamp but that is using module uploads gets an error that dir/main
has changed semantics, and get the user to add the version in manually once they've fixed their config?
At the very least, we can throw an error if we see someone pass dir
with a message to the release notes, and since we'll be verifying the path of main
the user will get an early error instructing them to change ./worker.mjs
to ./dist/worker.mjs
. Not great, but at least it's explicit. Again, the error message can reference the release notes.
Thoughts?
I'm not sure I follow why this would be required? I'm talking about changing how the main path is interpreted: ./dist/worker.mjs being relative to wrangler.toml rather than ./worker.mjs being relative to the implicit dir variable. It shouldn't require parsing the module graph, at least to my understanding?
Right now wrangler's behavior is to upload everything in dir
, using the relative slash path (so ./dist/index.mjs
and .\dist\index.mjs
become ./index.mjs
so projects work across unix/windows) of every file in dir as the module name. This implies all import
statements must be relative to dir
, as there's only a single path that will work for importing a module.
Once we get into the business of allowing folks to use relative paths from a given module instead of from dir
, wrangler must actually parse modules to read their import statements and include those modules in the upload, and since the runtime has no concept of paths in module names, if two files at different levels (therefore with a different relative path) import the same module, we have to replace the module name in the import statements with an absolute path before uploading (or some other scheme) for the import to work, or make the runtime treat module names as paths.
In this case, the worker would need to reference files using a relative path (import "../public/index.html") but I think that's pretty understandable for a user (at least, it makes more sense than paths being relative to wrangler.toml, imo)
I would 100% prefer this behavior, but implementing a way to parse es6 modules to read and modify their import statements from scratch is a tall order, and I don't think increasing wrangler's split in code between Rust and JS (in order to leverage parsers written in JS) is a good idea as wranglerjs
has been a recurring source of pain due to the difficulty of installing it at runtime consistently.
Given this only affects module syntax workers, which are in prerelease, does that not give us a bit of scope to make a breaking change? I don't know if we have any infrastructure to support release notes on a wrangler upgrade, or a codemod of their config file, that sort of thing.
I believe wrangler.toml was going to get a date stamp for runtime versioning in future, we could introduce that here: anything without a datestamp but that is using module uploads gets an error that dir/main has changed semantics, and get the user to add the version in manually once they've fixed their config?
I don't believe module format workers are considered prerelease anymore but @Electroid could confirm. We could use the new runtime versioning scheme to change the behavior in the runtime to interpret module names as paths and to allow relative imports, which is likely easier than re-writing them in wrangler.
Hmm, I'm not sure I see the problem. I'm not suggesting going all the way to parsing import statements, just using a different base for the paths-as-names approach.
if two files at different levels (therefore with a different relative path) import the same module
I think I can just see this happening in a rare edge case:
dist/A.mjs
dist
directory, that import each other, but aren't bundled// A.mjs
import B from '../public/B.mjs'
// B.js
import C from './C.mjs'
In the second case, the runtime won't be able to find ./C.mjs
since it's in the bundle with a key relative to A. Is that the edge case you're talking about? Seems to me the same thing is already possible if you had a subdirectory inside dist
with two files referencing each other, or if any of the imports left off the .mjs
extension (fine in a bundler, but not an exact string path match).
But in any case, a JS bundler will sort that out, even spanning directories. Are we expecting to be uploading module format workers in a deconstructured manner like this? esbuild
is lightning fast and doesn't require a config file, as long as the assets are all JS. Are there non-JS assets we're expecting that will also import other files?
I would 100% prefer this behavior, but implementing a way to parse es6 modules to read and modify their import statements from scratch is a tall order
I don't think these are linked. Am I missing something?
change the behavior in the runtime to interpret module names as paths and to allow relative imports, which is likely easier than re-writing them in wrangler.
This might be a good idea for other reasons, though again I'm not proposing that we need to rewrite anything in Wrangler. Keep paths-as-names for now, just change the config and replace dir
with include
.
In this case, the worker would need to reference files using a relative path (import "../public/index.html") but I think that's pretty understandable for a user (at least, it makes more sense than paths being relative to wrangler.toml, imo)
I'm referring specifically to this line as impossible to implement without parsing modules -- In order to avoid parsing import statements, all paths need to be relative to the same point, either wrangler.toml
(which is the only way that makes sense using the include
glob config) or dir
(how it is now). If you allow different paths to refer to the same module, we need to parse paths in import statements and convert them to use the canonical path that's actually used as the module name.
The rest of your proposal is possible (using globs instead of dir
), but I don't think allowing folks to import from multiple directories and using wrangler.toml
as the base for relative paths as module names is worth the effort over what we have now -- the usability problem here is dir
being implicit, and the poor error message when the configured main module isn't present in the uploaded set of modules, both of which we could fix without breaking users (by adding dir
to the default generated wrangler.toml for new projects, and improving the error message either in the runtime validator, or in wrangler itself).
Solving the paths-as-names abstraction being leaky would be nice, but bundlers handle that for us (as you pointed out) for the vast majority of use cases.
But in any case, a JS bundler will sort that out, even spanning directories. Are we expecting to be uploading module format workers in a deconstructured manner like this? esbuild is lightning fast and doesn't require a config file, as long as the assets are all JS. Are there non-JS assets we're expecting that will also import other files?
This is another reason we chose the current design -- the vast majority of modules will end up in the main bundle, since bundling is required for any non-trivial use case, and bundlers handle module resolution for us. In the majority of cases, only wasm modules, and the mjs shim used for allowing commonjs modules to be used will ever be uploaded separately from the bundle.
I'm referring specifically to this line as impossible to implement without parsing modules
Why do you think it's impossible? If the ./dist/worker.mjs
is the entry point, and that's used as the reference point in calculating paths, then include: ["./public/**/*"]
would add each file found, but with a path-as-name relative to the entry point (i.e. "../public/index.html").
In the standard case, if the entry point and all related files are together in dist
, this ends up producing the exact same result as what we have right now. It just generalises better. And, for a single file upload, you only ever specify a single file: the exact path from wrangler.toml
to the file to be uploaded.
with a path-as-name relative to the entry point (i.e. "../public/index.html").
AHH I see what you mean now, "../public/index.html" is actually the path-as-name -- not sure why that didn't click for me before.
And, for a single file upload, you only ever specify a single file: the exact path from wrangler.toml to the file to be uploaded.
I do quite like this simplicity for the common case, I just worry about the feasibility of introducing it now is all. This does seem like an improvement to me, and if we can make it happen without too much toil it'd be nice.
Do we really need a separate include
from the globs used for determining module types, you think?
not sure why that didn't click for me before.
No worries! Glad I wasn't missing something major that would have scuttled the idea.
Do we really need a separate include from the globs used for determining module types, you think?
Oh the rules
section? Actually, reusing that might make a lot more sense. But again, those paths are currently dir
-relative and I'm proposing making them wrangler.toml
-relative. But I suppose if we can find a way to make the change, we could change both...
Quick aside: can the entry point ever be anything other than an ESM file? And I'm not sure what fallthrough
is used for?
First step I'd say is figuring out whether changes like this are possible to roll out without major disruption. Let's assume we're just going to do the simplest thing (i.e. just make dir
required), see how much work that would be, and figure out if it's much worse to make further changes at the same time?
It still raises a couple of questions:
wrangler
command if there's a [build.upload]
section but no dir
argument, but still allow 'dist'
through as a default value?dir = 'dist'
?dir = 'dist'
into wrangler.toml
for the user? How long do we leave this code in for?Personally, I think it'd be great to be able to auto-update wrangler.toml
for the user, though probably not without an initial prompt. I'd also love to only do so if we detect the wrangler.toml
as being "old". To that end, I think we could do the following:
wrangler_version = "1.18.0" # auto-inserted on project creation
type = "javascript"
name = "my project"
# ...
Since changes in config would only be rolled out with a new Wrangler version, if we track the version of a wrangler.toml
file in this way, we can start to do deprecation warnings. For any file without a version, we can assume it was on 1.17.0
.
With versioned config we don't need to upgrade the user immediately, and our notification messages get better:
> wrangler publish
⚠️ Note: You are running Wrangler v1.18.0 but wrangler.toml has version 1.17.0
📝 Release notes: https://github.com/cloudflare/wrangler/blob/master/CHANGELOG.md#v1180
👉 To auto-upgrade, run: wrangler upgrade
� Running yarn typecheck && yarn build
...
This has the benefit of not touching a user's file without their explicit action (running wrangler upgrade
or whatever we call it, maybe wrangler config update
?). It also never prevents a user from deploying an existing project, for example, on a CI machine that's always installing the latest version of wrangler during a build.
At a certain point we can drop support for older versions with a message like this:
> wrangler publish
❌ Error: Wrangler v2+ only supports wrangler.toml of versions >= 1.18.0
👉 Read the upgrade guide: https://developers.cloudflare.com/workers/cli-wrangler/upgrade-guide
👉 To install Wrangler v1 instead, see: https://developers.cloudflare.com/workers/cli-wrangler/install-v1
If we put something like this in place I think we'll get a pretty smooth experience. It basically makes anything code-moddable feel "backwards-compatible"—since for older wrangler.toml files we know to run any transform on the config after we load it. But, when running wrangler config update
we load, run the transform, then write back to the toml file, along with the updated version marker. No idea how to write that in Rust, but I'm game to try to learn :)
the wrangler upgrade
explicit action to modify the toml is nice, I definitely prefer that (keeping old semantics working while notifying interactive terminals you can upgrade) so we don't break CI.
Note modifying wrangler.toml
in particular is tricky without messing with formatting -- we have toml-edit
in wrangler now, but it only supports toml 0.4 syntax while the main parser toml-rs
supports the full toml 1.0 syntax (namely, dotted keys). There's a good chance some folks are using toml 1.0 syntax, and we'd fail to edit their config as a result.
Ha! Faced the exact same problem with FABs, which is why I went with JSON5 since I found that jju handles editing the file but preserving formatting and comments.
I see toml-rs
as only supporting version 0.5, is there much difference between 0.4 and 0.5? Will toml-rs
not preserve formatting at all?
Ah sorry I misremembered, 0.5 added dotted keys, while 1.0.0 I think is 0.5 plus only minor changes. Dotted keys are pretty nice, I’d expect some folks to be using them and some of our old examples used them before I noticed the discrepancy.
toml-rs doesn’t preserve formatting or comments at all.
This issue has been automatically marked as stale because it has not had recent activity in the last 180 days. It will be closed if no further activity occurs in the next week. Please feel free to comment if you'd like it to remain open, and thank you for your contributions.
I suppose this is less relevant with the Wrangler2 beta, which actually crawls the module graph to figure out what to upload: https://github.com/cloudflare/wrangler2
just commenting to say i wasted 2 hours on this issue.
Just a note on how we handle this in wrangler2 - we're deprecating build.upload.main
in favour of a top level main
field (and there isn't a dir
field). We also log a warning with the exact line to copy paste into wrangler.toml
(see https://github.com/cloudflare/wrangler2/pull/528) The module system now applies to both modules and service worker format workers (we've also deprecated build.upload.format
since we can infer it automatically)
We're working on docs and migration guides before we roll this out to GA.
In the documentation:
In particular:
This tripped me up, and I think it'll trip up others. E.g. in the example immediately above:
It's really not clear that this will actually be uploading a file called
./dist/worker.mjs
. So, if someone copies a template, they're likely to make the same mistake I did, which is assume the example is building into the same directory aswrangler.toml
, and then write./dist/worker.mjs
, which actually means./dist/dist/worker.mjs
, which won't work.Worse still, the error message is misleading, see #1952
Suggested fix
Easiest option, make
"dir"
required, and makemain
not start with./
(so it doesn't look relative to thewrangler.toml
file):Now at least there's some indication that
main
might be relative todir
, though I'd argue this is still confusing.To suggest better solutions I think I'd need a bit more context on why some of these settings need to be there, however. I don't understand why
"main"
would be relative to "dir" at all, or why"dir"
is even necessary? Surely this can be simplified? If there's a bunch of other files in"dir"
that need to be included, can't they be specified in something like aninclude = [...]
syntax? Orimport './x.html'
syntax in the entry point?If we get rid of
dir
entirely, and just makemain
relative towrangler.toml
, we'd get this:This is basically what we have in the example right now, but without the invisible footgun of a missing
dir
argument.