denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
96.6k stars 5.33k forks source link

Rewrite deno_dir.rs #2057

Closed ry closed 5 years ago

ry commented 5 years ago

deno_dir is starting to show its age, it's time for it to be redesigned from scratch. Here are some of the issues I want to address

anything else?

hayd commented 5 years ago

cli/compiler's ModuleMetaData is also closely related, should that be pulled out too?

I propose this be called core/cache.rs. Happy to have a stab at this.

ry commented 5 years ago

@hayd I think core/cache.rs is a good name. ModuleMetaData is used in several situations where it shouldn't be (example) and ModuleMetaData has some bits to support TypeScript whereas //core is only JS - so I don't think it should be brought over. Glad you want to take this on, I'd like to spec this out in a bit more detail before you dive in.

I'm not totally sure about this part "Remove the gen directory completely. Instead generated javascript and source maps should live along side their typescript counterparts."

bartlomieju commented 5 years ago

I'm not totally sure about this part "Remove the gen directory completely. Instead generated javascript and source maps should live along side their typescript counterparts."

IMHO gen/ is fine as is, I'd hate to see my project dir explode with generated files after running deno, it'd be nightmare to add entries to .gitignore for each simple project.

I want to suggest a way to purge gen/ dir, it's been mentioned a few times already, especially when upgrading between versions that change compilation process. Although I'm not sure if --reload handles that already.

ry commented 5 years ago

I wasn’t suggesting that the generated files would be in the project directory. They’d still live in the cache directory. For example if you had /home/ry/main.ts that would map to $DENO_DIR/file/home/ry/main.js

bartlomieju commented 5 years ago

They’d still live in the cache directory. For example if you had /home/ry/main.ts that would map to $DENO_DIR/file/home/ry/main.js

I didn't catch that... This approach is super clean :+1:

hayd commented 5 years ago

For consistency would you also cp main.ts to $DENO_DIR/file/home/ry/main.ts ? That way "generated javascript and source maps should live along side their typescript counterparts" in the same way as they do for http/https.

ry commented 5 years ago

@hayd hmm seems superfluous...

hayd commented 5 years ago

That's true, I agree it's a little weird, but otherwise among http/https/etc. file is going to be a special case in not having a .ts file alongside .js and .js.map...

bartlomieju commented 5 years ago

I've started working on progress bars for downloads and import, since downloads and compilation is async it would be helpful to store number of downloaded and compiled files on DenoDir so progress bars can be updated as new files are discovered.

ry commented 5 years ago

@bartlomieju I'd rather there be a "ProgressBar" object (which should wrap whichever progress bar crate we're using) which can be added to ThreadSafeState. Then you'd have something like ProgressBar::start(action: String) and ProgressBar::end(action: String) ? Not sure about that exact API, but some functionality like that.

kitsonk commented 5 years ago

One consideration, with CheckJS enabled (see #2114) the TypeScript compiler will transform JS->JS. While we could discard the output, it makes more sense to cache both the input and output. This would allow support of transforming many CommonJS modules and AMD modules into ES modules that can be run by Deno transparently. In that case we would have an input and output JS to cache.

ry commented 5 years ago

This would allow support of transforming many CommonJS modules and AMD modules into ES modules that can be run by Deno transparently

@kitsonk Although I'm not against this in the limit, I wouldn't want to tie this requirement to refactoring DenoDir. Does CheckJS output something different than the input if the input is ESM?

ry commented 5 years ago

To summarize my position (using suggest we've discussed here):

//cli/deno_dir.rs needs to be re-written as it's very messy. The rewrite should be called //core/cache.rs.

Goals of the new //core/cache.rs:

If there are no other requirements, I welcome pseudo-code which starts specifying the interface.

kitsonk commented 5 years ago

Does CheckJS output something different than the input if the input is ESM?

Not with our targets as they are. If we weren't supporting esnext as the target, syntactical constructs would be down-emitted (e.g. async iterables), but if we are not doing any down level support it should be like for like (though I think we have things like strip comments, so that would be erased from the output file).

I guess my main point is there is always an input file and usually an output file (for example JSON imports should be transformed to an embedded .js file), so we could ensure that the cache structure will always work (e.g. not be opionated that input files are always .ts, the could be .js, .mjs. .esm, extension-less, etc. and that we can/do process them, conform them as an output file which will always be .js even if it is just a copy paste).

ry commented 5 years ago

It would be cool to support --outFile some day and magically give people the ability to bundle their app. And if they bundle their app, they probably want to control the target.

(This seems like it wouldn't be too difficult to support these days?)

kitsonk commented 5 years ago

--outFile would only work with something like if we were outputting AMD where you can simply concatenate modules. With ESM, you can't simply concatenate modules, you have to rewrite them. One of the big "issues" I have with ESM, but it is the standard.

kitsonk commented 5 years ago

At @hayd are you still working on this? If not, I would like to.

bartlomieju commented 5 years ago

I'd like to add my 2 cents:

hayd commented 5 years ago

@kitsonk please take it.

kitsonk commented 5 years ago

@ry and I discussed, thus should rest a little bit until bundle and import maps settle down a bit.

bartlomieju commented 5 years ago

Here's link to high level diagram that I created with my take on rewriting deno_dir:

https://drive.google.com/file/d/1xk8e1nCWSugvbz1yOGlQQJmTDEmNJQh_/view?usp=sharing

Write up with explanation for my choices and pseudo code will follow tomorrow - I need to polish it a bit. There are still some missing bits like:

Anyway feel free to take a look and comment

bartlomieju commented 5 years ago

Proposed implementation

Overview

Main goal of rewrite is to make DenoDir more maintainable, allow for further improvements in future and limit number of IO operations (file and net fetches).

Rewrite requires significant changes to following files:

Module - struct passed directly to V8 SourceCodeInfo - already existing struct in //core/modules.rs

save_source_code(
  "https://deno.land/std/http/file_server.ts",
  "<snip>"
);
// saves file to `$DENO_DIR/https/deno.land/std/http/file_server.ts
save_auxiliary_file(
  "https://deno.land/std/http/file_server.ts",
  "headers.json",
  "<snip>"
);
// saves file to `$DENO_DIR/https/deno.land/std/http/file_server.ts.headers.json

save_auxiliary_file(
  "file:///dev/file_server.js",
  "map",
  "<snip>"
);
// saves file to `$DENO_DIR/file/dev/file_server.js.map

Important part of new DenoDir is ModuleMetaDataCache:

struct ModuleMetaDataCache(Mutex<HashMap<ModuleSpecifier, ModuleMetaData>>);

impl ModuleMetaDataCache {
  pub fn get(self, module_specifier) -> Option<ModuleMetaData>

  pub fn cache(self, module_meta_data)
}

It is a simple hash map that stores metadata for modules that have already been requested and fetched - ensuring that no file is read from disk/fetched from net more than once. This is in-process cache.

MISSING BITS:

TS compiler

State

Currently fetch_module_meta_data_and_maybe_compile_async is main function that is called to load module code into V8. I propose following algorithm for this function:

1. try to get cached `ModuleMetaData` for module:
  a) if requested module is TS file resolve URL for compiled module (ie. replace .ts with .js extension)
  b) call `DenoDir::fetch_cached_module_meta_data` 
  c) if `DenoDir::fetch_cached_module_meta_data` returned something go to 10.
2. if requested module is not TS file call `DenoDir::fetch_module_meta_data` and go to 10.
3. can `use_cache`?
  a) try to get compiled module `ModuleMetaData` from on-disk cache by issuing `DenoDir::fetch_module_meta_data` (ie. with .js extension)
  b) if returned None go to 4.
  c) if cached version is stale (this can be established using similar approach to current `cache_path`, but storing that hash in auxiliary file), go to 4.
  d) return and go to 10.
4. setup TS compiler worker and runtime
5. post message to TS compiler
6. call `ts.createProgram`, and for requested module and its imports:
  a) get source file by calling `op_fetch_module_meta_data`
  b) resolve imports by calling `op_resolve_modules`
  c) write compiled output to disk by calling `op_cache_compiler_output`
7. get message from TS compiler worker
8. handle errors and diagnostic from TS worker and exit or
9. get compiled module `ModuleMetaData` from on-disk cache by issuing `DenoDir::fetch_module_meta_data` (ie. with .js extension) - this call can't fail!
10. return `SourceCodeInfo(module_url, source_code)` constructed from `ModuleMetaData`

With described approach it's possible to encapsulate TS compilation logic in small struct called TSCompiler:

impl TSCompiler {
  pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
    let compile_module_specifier = ...;
    if use_cache {
      let compiled_module_meta_data = await self.deno_dir.fetch_module_meta_data(compiled_module_specifier);

      if !self.is_stale(compiled_module_meta_data) {
        compiled_module_meta_data.into_module();
      }
    }

    let ts_compiler = setup_compiler_worker();
    let compiler_result = ts_compiler.post_message(module_specifier);
    if compiler_result.error || compiler_result.diagnostics {
      return DenoError::from(compiler_result)
    }

    let compiled_module_meta_data = await self.deno_dir.fetch_module_meta_data(compiled_module_specifier);

    compiled_module_meta_data.into_module();
  }
}

The great thing is it can be abstracted away to supported other file types:

impl JSONCompiler {
  pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
    let module_meta_data = await self.deno_dir.fetch_module_meta_data(module_specifier);

    // wrap JSON source in `export default` and return `SourceCodeInfo`
    {
      source_dode: format!(
        "export default {};",
        str::from_utf8(&self.source_code).unwrap()
      ),
      module_specifier
    }
  }
}

Want to import Rust file in your JavaScript like in Parcel? There you go:

impl RustCompiler {
  pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
    let compile_module_specifier = ...;
    let module_meta_data = await self.deno_dir.fetch_module_meta_data(module_specifier);
    // compile Rust file to WebAssembly
    // https://github.com/parcel-bundler/parcel/blob/master/packages/core/parcel-bundler/src/assets/RustAsset.js

    // wrap output WASM code similar to Parcel:
    // https://github.com/parcel-bundler/parcel/blob/master/packages/core/parcel-bundler/src/builtins/loaders/browser/wasm-loader.js
  }
}

That means we can create something called CompilationManager that allows compilers to register themselves for different extensions/media types.

There are probably a few details that I forgot to describe, I'll update this comment when I remember them.

Please comment and ask questions.

CC @ry @kitsonk @hayd @piscisaureus

EDIT: Also CC @kevinkassimo

ry commented 5 years ago

It's very nice to have such a comprehensive analysis of the problem! I think the abstracted compiler sounds good.

I'd like to have a lower-level interface to DENO_DIR which doesn't know anything about downloading/compiling or even modules. I'm not sure exactly what this would look like, but it might be something as simple as

struct DenoDir2 {
   location: PathBuf,
}

impl DenoDir2 {
  fn set(url: Url, data: &[u8]) -> Result<()> { unimplemented!() }
  fn get(url: Url) -> Result<Vec<u8>> { unimplemented!() }
}

I think this could work fine with your design?

bartlomieju commented 5 years ago

It's very nice to have such a comprehensive analysis of the problem! I think the abstracted compiler sounds good.

Thanks!

I'd like to have a lower-level interface to DENO_DIR which doesn't know anything about downloading/compiling or even modules. I'm not sure exactly what this would look like, but it might be something as simple as

struct DenoDir2 {
   location: PathBuf,
}

impl DenoDir2 {
  fn set(url: Url, data: &[u8]) -> Result<()> { unimplemented!() }
  fn get(url: Url) -> Result<Vec<u8>> { unimplemented!() }
}

I think this could work fine with your design?

Yes, that should be possible :+1: looks like it corresponds to read_file/write_file part in the diagram which is the smallest building block of DenoDir

Actually I believe it'd be possible to structure code in such a way, that allows to skip compiler step completely - then, DenoDir would be able to handle only JS files so deno crate users can build upon that.

EDIT: FYI I'm working on prototype of proposed implementation. I'll get back at the end of week.

kevinkassimo commented 5 years ago

@bartlomieju Looks good at first glance, will check more carefully tonight. Also would it be better if we could actually put this proposal in a Google Doc so others could add their suggestions on certain parts of the proposal (and easier to make comment threads)? (You could create a shareable link for suggestion only)

bartlomieju commented 5 years ago

@kevinkassimo thanks for suggestion, here's link to doc: https://docs.google.com/document/d/1mGoIQzXe8hr2117I-_xmVuJ4dzqSbnE0IoU-2gXZG_k/edit?usp=sharing

bartlomieju commented 5 years ago

@ry do you think we can close this issue now?

ry commented 5 years ago

@bartlomieju Thank you very much for the critical work you've done in this refactor: 2e1ab8232156a23afd22834c1e707fb3403c0db6, 421cbd39b4f0fdbdfc2eeed6da8dd3410246a044, ac269beabe1b16294118e64e69bf487639086941, 70de8dd51d465ea2016d6bb7c0728111f4493668, 8214b686cea3f6ad57d7da49a44d33185fdeb098, 72d9045528ad69ec32d7de9707cea65fab9f405e.

I still think the deno dir needs some work, but you've addressed most of the issues outlined here. I will create a new issue if/when we need to do more.