Closed ry closed 5 years ago
cli/compiler's ModuleMetaData is also closely related, should that be pulled out too?
I propose this be called core/cache.rs. Happy to have a stab at this.
@hayd I think core/cache.rs is a good name. ModuleMetaData
is used in several situations where it shouldn't be (example) and ModuleMetaData
has some bits to support TypeScript whereas //core
is only JS - so I don't think it should be brought over.
Glad you want to take this on, I'd like to spec this out in a bit more detail before you dive in.
I'm not totally sure about this part "Remove the gen directory completely. Instead generated javascript and source maps should live along side their typescript counterparts."
I'm not totally sure about this part "Remove the gen directory completely. Instead generated javascript and source maps should live along side their typescript counterparts."
IMHO gen/
is fine as is, I'd hate to see my project dir explode with generated files after running deno
, it'd be nightmare to add entries to .gitignore
for each simple project.
I want to suggest a way to purge gen/
dir, it's been mentioned a few times already, especially when upgrading between versions that change compilation process. Although I'm not sure if --reload
handles that already.
I wasn’t suggesting that the generated files would be in the project directory. They’d still live in the cache directory. For example if you had /home/ry/main.ts that would map to $DENO_DIR/file/home/ry/main.js
They’d still live in the cache directory. For example if you had /home/ry/main.ts that would map to $DENO_DIR/file/home/ry/main.js
I didn't catch that... This approach is super clean :+1:
For consistency would you also cp main.ts to $DENO_DIR/file/home/ry/main.ts ? That way "generated javascript and source maps should live along side their typescript counterparts" in the same way as they do for http/https.
@hayd hmm seems superfluous...
That's true, I agree it's a little weird, but otherwise among http/https/etc. file is going to be a special case in not having a .ts file alongside .js and .js.map...
I've started working on progress bars for downloads and import, since downloads and compilation is async it would be helpful to store number of downloaded and compiled files on DenoDir
so progress bars can be updated as new files are discovered.
@bartlomieju I'd rather there be a "ProgressBar" object (which should wrap whichever progress bar crate we're using) which can be added to ThreadSafeState. Then you'd have something like ProgressBar::start(action: String)
and ProgressBar::end(action: String)
? Not sure about that exact API, but some functionality like that.
One consideration, with CheckJS enabled (see #2114) the TypeScript compiler will transform JS->JS. While we could discard the output, it makes more sense to cache both the input and output. This would allow support of transforming many CommonJS modules and AMD modules into ES modules that can be run by Deno transparently. In that case we would have an input and output JS to cache.
This would allow support of transforming many CommonJS modules and AMD modules into ES modules that can be run by Deno transparently
@kitsonk Although I'm not against this in the limit, I wouldn't want to tie this requirement to refactoring DenoDir. Does CheckJS output something different than the input if the input is ESM?
To summarize my position (using suggest we've discussed here):
//cli/deno_dir.rs
needs to be re-written as it's very messy. The rewrite should be called //core/cache.rs
.
Goals of the new //core/cache.rs
:
//core
. In particular, it should not import //core/isolate.rs
and ideally it would not even import //core/modules.rs
(but this can be discussed - maybe it's necessary).$DENO_DIR/https/example.com/a/b/c.ts
$DENO_DIR/https/example.com/a/b/c.ts.headers.json
$DENO_DIR/https/example.com/a/b/c.js
(Yes, this means getting rid of the $DENO_DIR/gen
directory)$DENO_DIR/https/example.com/a/b/c.ts.d
not to be confused with d.ts
)file:///
url. Example: $DENO_DIR/file/Users/rld/src/deno/tests/002_hello.js
would be the cached output of /Users/rld/src/deno/tests/002_hello.ts
//core
is JS-only.If there are no other requirements, I welcome pseudo-code which starts specifying the interface.
Does CheckJS output something different than the input if the input is ESM?
Not with our targets as they are. If we weren't supporting esnext
as the target, syntactical constructs would be down-emitted (e.g. async iterables), but if we are not doing any down level support it should be like for like (though I think we have things like strip comments, so that would be erased from the output file).
I guess my main point is there is always an input file and usually an output file (for example JSON imports should be transformed to an embedded .js
file), so we could ensure that the cache structure will always work (e.g. not be opionated that input files are always .ts
, the could be .js
, .mjs
. .esm
, extension-less, etc. and that we can/do process them, conform them as an output file which will always be .js
even if it is just a copy paste).
It would be cool to support --outFile
some day and magically give people the ability to bundle their app. And if they bundle their app, they probably want to control the target.
(This seems like it wouldn't be too difficult to support these days?)
--outFile
would only work with something like if we were outputting AMD where you can simply concatenate modules. With ESM, you can't simply concatenate modules, you have to rewrite them. One of the big "issues" I have with ESM, but it is the standard.
At @hayd are you still working on this? If not, I would like to.
I'd like to add my 2 cents:
- Module resolution should be done using exactly this API: https://html.spec.whatwg.org/multipage/webappapis.html#resolve-a-module-specifier
To add to that, we should use specialized types for passing specifiers between functions - using &str
is cumbersome and you never know if specifier is already resolved or not, which leads to resolving resolved specifiers multiple times, see this comment for reference
same file shouldn't be downloaded multiple times on the same run - #2442, #2477
to fully support import maps we should have ability to try multiple URLs for single module and reuse first successful URL for subsequent calls - reference - this can be incorporated into previous point
I'd opt to separate code paths for situations with --reload
and --no-fetch
flags - otherwise we'll end up in big conditionals like in get_source_code_async
:
https://github.com/denoland/deno/blob/e2468d8971b9c9054448a7a6bd95be6f99bf28da/cli/deno_dir.rs#L405-L409
@kitsonk please take it.
@ry and I discussed, thus should rest a little bit until bundle and import maps settle down a bit.
Here's link to high level diagram that I created with my take on rewriting deno_dir:
https://drive.google.com/file/d/1xk8e1nCWSugvbz1yOGlQQJmTDEmNJQh_/view?usp=sharing
Write up with explanation for my choices and pseudo code will follow tomorrow - I need to polish it a bit. There are still some missing bits like:
metadata.json
in which we can store hash of the compilation (eg. result of current cache_path
is a good value we could use)DenoDir
by multiple processes, @ry suggested lock file for that, I'm not yet sure how to incorporate itAnyway feel free to take a look and comment
Main goal of rewrite is to make DenoDir
more maintainable, allow for further improvements in future and limit number of IO operations (file and net fetches).
Rewrite requires significant changes to following files:
//cli/deno_dir.rs
//cli/state.rs
//cli/compiler.rs
//js/compiler.ts
ModuleMetaData
module_specifier
module_redirect_source_name
filename
media_type
source_code
Module
- struct passed directly to V8SourceCodeInfo
- already existing struct in //core/modules.rs
module_specifier
source_code
Important goal of rewrite is to make DenoDir
agnostic of TypeScript - it should operate as cache/fetcher for local and remote modules.
This goal achieved via following public API:
fetch_module_meta_data(module_specifier) -> Future<ModuleMetaData, DenoError>
Main method for obtaining source files for Deno. It's similar to current implementation but is completely agnostic of type of file fetched - it literally be any filetype; JS, JSON, TS, SourceMapfetch_cached_module_meta_data(module_specifier) -> Option<ModuleMetaData>
Returns already loaded ModuleMetaData
for given module (that is cached in-process for fast subsequent access) if it was previously loaded or None save_source_code(module_specifier, code)
Saves source code of module to disk, does conversion from module_specifier
to local filepath (using url_to_deps_path
).save_source_code(
"https://deno.land/std/http/file_server.ts",
"<snip>"
);
// saves file to `$DENO_DIR/https/deno.land/std/http/file_server.ts
save_auxiliary_file(module_specifier, filetype, contents)
Saves file to disk, that is somehow related to module - eg. headers file for remote module for compiled modules. Example usage:save_auxiliary_file(
"https://deno.land/std/http/file_server.ts",
"headers.json",
"<snip>"
);
// saves file to `$DENO_DIR/https/deno.land/std/http/file_server.ts.headers.json
save_auxiliary_file(
"file:///dev/file_server.js",
"map",
"<snip>"
);
// saves file to `$DENO_DIR/file/dev/file_server.js.map
Important part of new DenoDir
is ModuleMetaDataCache
:
struct ModuleMetaDataCache(Mutex<HashMap<ModuleSpecifier, ModuleMetaData>>);
impl ModuleMetaDataCache {
pub fn get(self, module_specifier) -> Option<ModuleMetaData>
pub fn cache(self, module_meta_data)
}
It is a simple hash map that stores metadata for modules that have already been requested and fetched - ensuring that no file is read from disk/fetched from net more than once. This is in-process cache.
MISSING BITS:
$DENO_DIR
resolveModuleNames
we're issuing a number of op_fetch_module_meta_data
ops. Compiler requests only resolved URLs of imports
but we're downloading them actually. This can be solved in two ways:
op_fetch_module_meta_data
with op_resolve_modules
which simply resolves all specifiers and returns them to TS compilerop_fetch_module_meta_data
with op_fetch_modules_meta_data
and download all requested modules concurrently (as of now modules are downloaded one after another)op_cache
should be replaced with op_cache_compiler_output
that saves compiled output code and it's source map (using DenoDir::save_source_code
and DenoDir::save_auxiliary_code
).Currently fetch_module_meta_data_and_maybe_compile_async
is main function that is called to load module code into V8. I propose following algorithm for this function:
1. try to get cached `ModuleMetaData` for module:
a) if requested module is TS file resolve URL for compiled module (ie. replace .ts with .js extension)
b) call `DenoDir::fetch_cached_module_meta_data`
c) if `DenoDir::fetch_cached_module_meta_data` returned something go to 10.
2. if requested module is not TS file call `DenoDir::fetch_module_meta_data` and go to 10.
3. can `use_cache`?
a) try to get compiled module `ModuleMetaData` from on-disk cache by issuing `DenoDir::fetch_module_meta_data` (ie. with .js extension)
b) if returned None go to 4.
c) if cached version is stale (this can be established using similar approach to current `cache_path`, but storing that hash in auxiliary file), go to 4.
d) return and go to 10.
4. setup TS compiler worker and runtime
5. post message to TS compiler
6. call `ts.createProgram`, and for requested module and its imports:
a) get source file by calling `op_fetch_module_meta_data`
b) resolve imports by calling `op_resolve_modules`
c) write compiled output to disk by calling `op_cache_compiler_output`
7. get message from TS compiler worker
8. handle errors and diagnostic from TS worker and exit or
9. get compiled module `ModuleMetaData` from on-disk cache by issuing `DenoDir::fetch_module_meta_data` (ie. with .js extension) - this call can't fail!
10. return `SourceCodeInfo(module_url, source_code)` constructed from `ModuleMetaData`
With described approach it's possible to encapsulate TS compilation logic in small struct called TSCompiler
:
impl TSCompiler {
pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
let compile_module_specifier = ...;
if use_cache {
let compiled_module_meta_data = await self.deno_dir.fetch_module_meta_data(compiled_module_specifier);
if !self.is_stale(compiled_module_meta_data) {
compiled_module_meta_data.into_module();
}
}
let ts_compiler = setup_compiler_worker();
let compiler_result = ts_compiler.post_message(module_specifier);
if compiler_result.error || compiler_result.diagnostics {
return DenoError::from(compiler_result)
}
let compiled_module_meta_data = await self.deno_dir.fetch_module_meta_data(compiled_module_specifier);
compiled_module_meta_data.into_module();
}
}
The great thing is it can be abstracted away to supported other file types:
impl JSONCompiler {
pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
let module_meta_data = await self.deno_dir.fetch_module_meta_data(module_specifier);
// wrap JSON source in `export default` and return `SourceCodeInfo`
{
source_dode: format!(
"export default {};",
str::from_utf8(&self.source_code).unwrap()
),
module_specifier
}
}
}
Want to import Rust file in your JavaScript like in Parcel? There you go:
impl RustCompiler {
pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
let compile_module_specifier = ...;
let module_meta_data = await self.deno_dir.fetch_module_meta_data(module_specifier);
// compile Rust file to WebAssembly
// https://github.com/parcel-bundler/parcel/blob/master/packages/core/parcel-bundler/src/assets/RustAsset.js
// wrap output WASM code similar to Parcel:
// https://github.com/parcel-bundler/parcel/blob/master/packages/core/parcel-bundler/src/builtins/loaders/browser/wasm-loader.js
}
}
That means we can create something called CompilationManager
that allows compilers to register themselves for different extensions/media types.
There are probably a few details that I forgot to describe, I'll update this comment when I remember them.
Please comment and ask questions.
CC @ry @kitsonk @hayd @piscisaureus
EDIT: Also CC @kevinkassimo
It's very nice to have such a comprehensive analysis of the problem! I think the abstracted compiler sounds good.
I'd like to have a lower-level interface to DENO_DIR which doesn't know anything about downloading/compiling or even modules. I'm not sure exactly what this would look like, but it might be something as simple as
struct DenoDir2 {
location: PathBuf,
}
impl DenoDir2 {
fn set(url: Url, data: &[u8]) -> Result<()> { unimplemented!() }
fn get(url: Url) -> Result<Vec<u8>> { unimplemented!() }
}
I think this could work fine with your design?
It's very nice to have such a comprehensive analysis of the problem! I think the abstracted compiler sounds good.
Thanks!
I'd like to have a lower-level interface to DENO_DIR which doesn't know anything about downloading/compiling or even modules. I'm not sure exactly what this would look like, but it might be something as simple as
struct DenoDir2 { location: PathBuf, } impl DenoDir2 { fn set(url: Url, data: &[u8]) -> Result<()> { unimplemented!() } fn get(url: Url) -> Result<Vec<u8>> { unimplemented!() } }
I think this could work fine with your design?
Yes, that should be possible :+1: looks like it corresponds to read_file
/write_file
part in the diagram which is the smallest building block of DenoDir
Actually I believe it'd be possible to structure code in such a way, that allows to skip compiler
step completely - then, DenoDir
would be able to handle only JS files so deno
crate users can build upon that.
EDIT: FYI I'm working on prototype of proposed implementation. I'll get back at the end of week.
@bartlomieju Looks good at first glance, will check more carefully tonight. Also would it be better if we could actually put this proposal in a Google Doc so others could add their suggestions on certain parts of the proposal (and easier to make comment threads)? (You could create a shareable link for suggestion only)
@kevinkassimo thanks for suggestion, here's link to doc: https://docs.google.com/document/d/1mGoIQzXe8hr2117I-_xmVuJ4dzqSbnE0IoU-2gXZG_k/edit?usp=sharing
@ry do you think we can close this issue now?
@bartlomieju Thank you very much for the critical work you've done in this refactor: 2e1ab8232156a23afd22834c1e707fb3403c0db6, 421cbd39b4f0fdbdfc2eeed6da8dd3410246a044, ac269beabe1b16294118e64e69bf487639086941, 70de8dd51d465ea2016d6bb7c0728111f4493668, 8214b686cea3f6ad57d7da49a44d33185fdeb098, 72d9045528ad69ec32d7de9707cea65fab9f405e.
I still think the deno dir needs some work, but you've addressed most of the issues outlined here. I will create a new issue if/when we need to do more.
deno_dir is starting to show its age, it's time for it to be redesigned from scratch. Here are some of the issues I want to address
//core
and be independent of other modules in core. deno_dir.rs is currently quite tangled up with other modules, which makes it difficult to refactor.anything else?