深入Babel

前言

一口(很长的)气了解 babel

babel 架构图

@babel/core核心这也是上面说的‘微内核’架构中的‘内核’。对于Babel来说，这个内核主要干这些事情

加载和处理配置(config)
加载插件
调用 Parser 进行语法解析，生成 AST
调用 Traverser 遍历AST，并使用访问者模式应用'插件'对 AST 进行转换
生成代码，包括SourceMap转换和源代码生成

核心周边支撑

Parser(@babel/parser)：将源代码解析为 AST 就靠它了。它已经内置支持很多语法. 例如 JSX、Typescript、Flow、以及最新的ECMAScript规范。目前为了执行效率，parser是不支持扩展的，由官方进行维护。如果你要支持自定义语法，可以 fork 它，不过这种场景非常少。
Traverser(@babel/traverse)：实现了访问者模式，对 AST 进行遍历，转换插件会通过它获取感兴趣的AST节点，对节点继续操作, 转换器操作 AST 一般都是使用访问器模式，由这个访问者(Visitor)来 ① 进行统一的遍历操作（深度优先的顺序, 或者说递归地对 AST 进行遍历），② 提供节点的操作方法，③ 响应式维护节点之间的关系；而插件(设计模式中称为‘具体访问者’)只需要定义自己感兴趣的节点类型，当访问者访问到对应节点时，就调用插件的访问(visit)方法。

插件

语法插件(@babel/plugin-syntax-)：上面说了 @babel/parser 已经支持了很多 JavaScript 语法特性，Parser也不支持扩展. 因此plugin-syntax-实际上只是用于开启或者配置Parser的某个功能特性。一般用户不需要关心这个，Transform 插件里面已经包含了相关的plugin-syntax-*插件了。用户也可以通过parserOpts配置项来直接配置 Parser
转换插件：用于对 AST 进行转换, 实现转换为ES5代码、压缩、功能增强等目的. Babel仓库将转换插件划分为两种(只是命名上的区别)：(@babel/plugin-transform-：普通的转换插件 @babel/plugin-proposal-：还在'提议阶段'(非正式)的语言特性, 目前有这些)
预定义集合(@babel/presets-*)：插件集合或者分组，主要方便用户对插件进行管理和使用。比如preset-env含括所有的标准的最新特性; 再比如preset-react含括所有react相关的插件.

插件开发辅助

@babel/template：某些场景直接操作AST太麻烦，就比如我们直接操作DOM一样，所以Babel实现了这么一个简单的模板引擎，可以将字符串代码转换为AST。比如在生成一些辅助代码(helper)时会用到这个库
@babel/types： AST 节点构造器和断言. 插件开发时使用很频繁
@babel/helper-*：一些辅助器，用于辅助插件开发，例如简化AST操作
@babel/helper：辅助代码，单纯的语法转换可能无法让代码运行起来，比如低版本浏览器无法识别class关键字，这时候需要添加辅助代码，对class进行模拟。

工具

@babel/node： Node.js CLI, 通过它直接运行需要 Babel 处理的JavaScript文件
@babel/register： Patch NodeJs 的require方法，支持导入需要Babel处理的JavaScript模块
@babel/cli： CLI工具

抽象语法树（AST）

要了解Babel的工作原理，那首先需要了解抽象语法树，因为Babel插件就是作用于抽象语法树。首先我们编写的代码在编译阶段解析成抽象语法树（AST），然后经过一系列的遍历和转换，然后再将转换后的抽象语法树生成为常规的js代码。下面这幅图(来源)可以表示Babel的工作流程：

我们先说AST，代码解析成AST的目的就是方便计算机更好地理解我们的代码。这里我们先写一段代码：

function add(x, y) {
    return x + y;
}

add(1, 2);
复制代码

然后将代码解析成抽象语法树（在线工具），表示成JSON形式如下：

{
  "type": "Program",
  "start": 0,
  "end": 52,
  "body": [
    {
      "type": "FunctionDeclaration",
      "start": 0,
      "end": 40,
      "id": {
        "type": "Identifier",
        "start": 9,
        "end": 12,
        "name": "add"
      },
      "expression": false,
      "generator": false,
      "params": [
        {
          "type": "Identifier",
          "start": 13,
          "end": 14,
          "name": "x"
        },
        {
          "type": "Identifier",
          "start": 16,
          "end": 17,
          "name": "y"
        }
      ],
      "body": {
        "type": "BlockStatement",
        "start": 19,
        "end": 40,
        "body": [
          {
            "type": "ReturnStatement",
            "start": 25,
            "end": 38,
            "argument": {
              "type": "BinaryExpression",
              "start": 32,
              "end": 37,
              "left": {
                "type": "Identifier",
                "start": 32,
                "end": 33,
                "name": "x"
              },
              "operator": "+",
              "right": {
                "type": "Identifier",
                "start": 36,
                "end": 37,
                "name": "y"
              }
            }
          }
        ]
      }
    },
    {
      "type": "ExpressionStatement",
      "start": 42,
      "end": 52,
      "expression": {
        "type": "CallExpression",
        "start": 42,
        "end": 51,
        "callee": {
          "type": "Identifier",
          "start": 42,
          "end": 45,
          "name": "add"
        },
        "arguments": [
          {
            "type": "Literal",
            "start": 46,
            "end": 47,
            "value": 1,
            "raw": "1"
          },
          {
            "type": "Literal",
            "start": 49,
            "end": 50,
            "value": 2,
            "raw": "2"
          }
        ]
      }
    }
  ],
  "sourceType": "module"
}
复制代码

这里你会发现抽象语法树中不同层级有着相似的结构，比如：

{
    "type": "Program",
    "start": 0,
    "end": 52,
    "body": [...]
}
复制代码

{
    "type": "FunctionDeclaration",
    "start": 0,
    "end": 40,
    "id": {...},
    "body": {...}
}
复制代码

{
    "type": "BlockStatement",
    "start": 19,
    "end": 40,
    "body": [...]
}
复制代码

像这样的结构叫做节点（Node）。一个AST是由多个或单个这样的节点组成，节点内部可以有多个这样的子节点，构成一颗语法树，这样就可以描述用于静态分析的程序语法。

节点中的type字段表示节点的类型，比如上述AST中的"Program"、"FunctionDeclaration"、"ExpressionStatement"等等，当然每种节点类型会有一些附加的属性用于进一步描述该节点类型。

Babel的工作流程

上面那幅图已经描述了Babel的工作流程，下面我们再详细描述一下。Babel 的三个主要处理步骤分别是：解析（parse），转换（transform），生成（generate）。

解析(包含了两个步骤 - 词法分析 - 语法分析 )

将代码解析成抽象语法树（AST Abstract Syntax Tree），每个js引擎（比如Chrome浏览器中的V8引擎）都有自己的AST解析器，而Babel是通过Babylon实现的。在解析过程中有两个阶段：词法分析和语法分析，词法分析阶段把字符串形式的代码转换为令牌（tokens）流，令牌类似于AST中节点；而语法分析阶段则会把一个令牌流转换成 AST的形式，同时这个阶段会把令牌中的信息转换成AST的表述结构。
转换

在这个阶段，Babel接受得到AST并通过babel-traverse对其进行深度优先遍历，在此过程中对节点进行添加、更新及移除操作。这部分也是Babel插件介入工作的部分，如果这个阶段不使用任何插件，那么 babel 会原样输出代码。Plugin 会运行在 Preset 之前。Plugin 会从第一个开始顺序执行。Preset 的顺序则刚好相反(从最后一个逆序执行，plugin和preset顺序 )。
生成

将经过转换的AST通过babel-generator再转换成js代码，过程就是深度优先遍历整个AST，然后构建可以表示转换后代码的字符串，sourcemap也是这里生成。

这部分更详细的可以查看Babel手册。而值得注意的是，babel的插件有两种，一种是语法插件，这类插件是在解析阶段辅助解析器（Babylon）工作；另一类插件是转译插件，这类插件是在转换阶段参与进行代码的转译工作，这也是我们使用babel最常见也最本质的需求。这篇文章主要关注的也是babel的转译插件。

为了了解Babel在遍历时处理AST的具体过程，我们还需要了解下面几个重要知识点。

Visitor

当Babel处理一个节点时，是以访问者的形式获取节点信息，并进行相关操作，这种方式是通过一个visitor对象来完成的，在visitor对象中定义了对于各种节点的访问函数，这样就可以针对不同的节点做出不同的处理。我们编写的Babel插件其实也是通过定义一个实例化visitor对象处理一系列的AST节点来完成我们对代码的修改操作。举个栗子：

我们想要处理代码中用来加载模块的import命令语句

import { Ajax } from '../lib/utils';
复制代码

那么我们的Babel插件就需要定义这样的一个visitor对象：

visitor: {
            Program: {
                enter(path, state) {
                    console.log('start processing this module...');
                },
                exit(path, state) {
                    console.log('end processing this module!');
                }
            },
            ImportDeclaration (path, state) {
                console.log('processing ImportDeclaration...');
                // do something
            }
    }
复制代码

当把这个插件用于遍历中时，每当处理到一个import语句，即ImportDeclaration节点时，都会自动调用ImportDeclaration()方法，这个方法中定义了处理import语句的具体操作。ImportDeclaration()都是在进入ImportDeclaration节点时调用的，我们也可以让插件在退出节点时调用方法进行处理。

visitor: {
            ImportDeclaration: {
                enter(path, state) {
                    console.log('start processing ImportDeclaration...');
                    // do something
                },
                exit(path, state) {
                    console.log('end processing ImportDeclaration!');
                    // do something
                }
            },
    }
复制代码

当进入ImportDeclaration节点时调用enter()方法，退出ImportDeclaration节点时调用exit()方法。上面的Program节点（Program节点可以通俗地解释为一个模块节点）也是一样的道理。值得注意的是，AST的遍历采用深度优先遍历，所以上述import代码块的AST遍历的过程如下：

─ Program.enter() 
  ─ ImportDeclaration.enter()
  ─ ImportDeclaration.exit()
─ Program.exit() 
复制代码

所以当创建访问者时你实际上有两次机会来访问一个节点。

ps: 有关AST中各种节点类型的定义可以查看Babylon手册：github.com/babel/babyl…

Path

从上面的visitor对象中，可以看到每次访问节点方法时，都会传入一个path参数，这个path参数中包含了节点的信息以及节点和所在的位置，以供对特定节点进行操作。具体来说Path 是表示两个节点之间连接的对象。这个对象不仅包含了当前节点的信息，也有当前节点的父节点的信息，同时也包含了添加、更新、移动和删除节点有关的其他很多方法。具体地，Path对象包含的属性和方法主要如下：

── 属性      
  - node   当前节点
  - parent  父节点
  - parentPath 父path
  - scope   作用域
  - context  上下文
  - ...
── 方法
  - get   当前节点
  - findParent  向父节点搜寻节点
  - getSibling 获取兄弟节点
  - replaceWith  用AST节点替换该节点
  - replaceWithMultiple 用多个AST节点替换该节点
  - insertBefore  在节点前插入节点
  - insertAfter 在节点后插入节点
  - remove   删除节点
  - ...
复制代码

具体的可以查看babel-traverse。

这里我们继续上面的例子，看看path参数的node属性包含哪些信息：

visitor: {
    ImportDeclaration (path, state) { 
           console.log(path.node);
           // do something
    }
   }
复制代码

打印结果如下：

Node {
  type: 'ImportDeclaration',
  start: 5,
  end: 41,
  loc: 
   SourceLocation {
     start: Position { line: 2, column: 4 },
     end: Position { line: 2, column: 40 } },
  specifiers: 
   [ Node {
       type: 'ImportSpecifier',
       start: 14,
       end: 18,
       loc: [SourceLocation],
       imported: [Node],
       local: [Node] } ],
  source: 
   Node {
     type: 'StringLiteral',
     start: 26,
     end: 40,
     loc: SourceLocation { start: [Position], end: [Position] },
     extra: { rawValue: '../lib/utils', raw: '\'../lib/utils\'' },
     value: '../lib/utils'
    }
}

复制代码

可以发现除了type、start、end、loc这些常规字段，ImportDeclaration节点还有specifiers和source这两个特殊字段，specifiers表示import导入的变量组成的节点数组，source表示导出模块的来源节点。这里再说一下specifier中的imported和local字段，imported表示从导出模块导出的变量，local表示导入后当前模块的变量，还是有点费解，我们把import命令语句修改一下：

import { Ajax as ajax } from '../lib/utils';
复制代码

然后继续打印specifiers第一个元素的local和imported字段：

Node {
  type: 'Identifier',
  start: 22,
  end: 26,
  loc: 
   SourceLocation {
     start: Position { line: 2, column: 21 },
     end: Position { line: 2, column: 25 },
     identifierName: 'ajax' },
  name: 'ajax' }
Node {
  type: 'Identifier',
  start: 14,
  end: 18,
  loc: 
   SourceLocation {
     start: Position { line: 2, column: 13 },
     end: Position { line: 2, column: 17 },
     identifierName: 'Ajax' },
  name: 'Ajax' }
复制代码

这样就很明显了。如果不使用as关键字，那么imported和local就是表示同一个变量的节点了。

State

State是visitor对象中每次访问节点方法时传入的第二个参数。如果看Babel手册里的解释，可能还是有点困惑，简单来说，state就是一系列状态的集合，包含诸如当前plugin的信息、plugin传入的配置参数信息，甚至当前节点的path信息也能获取到，当然也可以把babel插件处理过程中的自定义状态存储到state对象中。

Scopes（作用域）

这里的作用域其实跟js说的作用域是一个道理，也就是说babel在处理AST时也需要考虑作用域的问题，比如函数内外的同名变量需要区分开来，这里直接拿Babel手册里的一个例子解释一下。考虑下列代码：

function square(n) {
  return n * n;
}
复制代码

我们来写一个把 n 重命名为 x 的visitor。

visitor: {
        FunctionDeclaration(path) {
                const param = path.node.params[0];
                paramName = param.name;
                param.name = "x";
             },

            Identifier(path) {
                if (path.node.name === paramName) {
                  path.node.name = "x";
                }
             }
    }
复制代码

对上面的例子代码这段访问者代码也许能工作，但它很容易被打破：

function square(n) {
  return n * n;
}
var n = 1;
复制代码

上面的visitor会把函数square外的n变量替换成x，这显然不是我们期望的。更好的处理方式是使用递归，把一个访问者放进另外一个访问者里面。

visitor: {
           FunctionDeclaration(path) {
           const updateParamNameVisitor = {
                  Identifier(path) {
                    if (path.node.name === this.paramName) {
                      path.node.name = "x";
                    }
                  }
                };
                const param = path.node.params[0];
                paramName = param.name;
                param.name = "x";
                path.traverse(updateParamNameVisitor, { paramName });
            },
    }
复制代码

到这里我们已经对Babel工作流程大概有了一些了解，下面我们再说一下Babel的工具集。

Babel的工具集

Babel 实际上是一组模块的集合，在上面介绍Babel工作流程中也都提到过。

Babylon

“Babylon 是 Babel的解析器。最初是从Acorn项目fork出来的。Acorn非常快，易于使用，并且针对非标准特性(以及那些未来的标准特性) 设计了一个基于插件的架构。”。这里直接引用了手册里的说明，可以说Babylon定义了把代码解析成AST的一套规范。引用一个例子：

import * as babylon from "babylon";
const code = `function square(n) {
  return n * n;
}`;

babylon.parse(code);
// Node {
//   type: "File",
//   start: 0,
//   end: 38,
//   loc: SourceLocation {...},
//   program: Node {...},
//   comments: [],
//   tokens: [...]
// }
复制代码

babel-traverse

babel-traverse用于维护操作AST的状态，定义了更新、添加和移除节点的操作方法。之前也说到，path参数里面的属性和方法都是在babel-traverse里面定义的。这里还是引用一个例子，将babel-traverse和Babylon一起使用来遍历和更新节点：

import * as babylon from "babylon";
import traverse from "babel-traverse";

const code = `function square(n) {
  return n * n;
}`;

const ast = babylon.parse(code);

traverse(ast, {
  enter(path) {
    if (
      path.node.type === "Identifier" &&
      path.node.name === "n"
    ) {
      path.node.name = "x";
    }
  }
});
复制代码

babel-types

babel-types是一个强大的用于处理AST节点的工具库，“它包含了构造、验证以及变换AST节点的方法。该工具库包含考虑周到的工具方法，对编写处理AST逻辑非常有用。”这个工具库的具体的API可以参考Babel官网：babeljs.io/docs/en/bab…

这里我们还是用import命令来演示一个例子，比如我们要判断import导入是什么类型的导入，这里先写出三种形式的导入：

import { Ajax } from '../lib/utils';
import utils from '../lib/utils';
import * as utils from '../lib/utils';
复制代码

在AST中用于表示上面导入的三个变量的节点是不同的，分别叫做ImportSpecifier、ImportDefaultSpecifier和ImportNamespaceSpecifier。具体可以参考这里。如果我们只对导入指定变量的import命令语句做处理，那么我们的babel插件就可以这样写：

function plugin () {
    return ({ types }) => ({
        visitor: {
            ImportDeclaration (path, state) { 
                const specifiers = path.node.specifiers;
                specifiers.forEach((specifier) => {
                    if (!types.isImportDefaultSpecifier(specifier) && !types.isImportNamespaceSpecifier(specifier)) {
                        // do something
                    }
                })
            }
        }
    }
复制代码

到这里，关于Babel的原理差不多都讲完了，下面我们尝试写一个具体功能的Babel插件。

Babel插件实践

这里我们尝试实现这样一个功能：当使用UI组件库时，我们常常只会用到组件库中的部分组件，就像这样：

import { Select, Pagination } from 'xxx-ui';
复制代码

但是这样却引入了整个组件库，那么打包的时候也会把整个组件库的代码打包进去，这显然是不太合理的，所以我们希望能够在打包的时候只打包我们需要的组件。

Let's do it!

首先我们需要告诉Babel怎么找到对应组件的路径，也就是说我们需要自定义一个规则告诉Babel根据指定名称加载对应组件，这里我们定义一个方法：

"customSourceFunc": componentName =>（`./xxx-ui/src/components/ui-base/${componentName}/${componentName}`）}
复制代码

这个方法作为这个插件的配置参数，可以配置到.babelrc(准确来说是.babelrc.js)或者babel-loader里面。接下来我们需要定义visitor对象，有了之前的铺垫，这里直接上代码：

visitor: {
    ImportDeclaration (path, { opts }) {
        const specifiers = path.node.specifiers;
        const source = path.node.source;

            // 判断传入的配置参数是否是数组形式
        if (Array.isArray(opts)) {
            opts.forEach(opt => {
                assert(opt.libraryName, 'libraryName should be provided');
            });
            if (!opts.find(opt => opt.libraryName === source.value)) return;
        } else {
            assert(opts.libraryName, 'libraryName should be provided');
            if (opts.libraryName !== source.value) return;
        }

        const opt = Array.isArray(opts) ? opts.find(opt => opt.libraryName === source.value) : opts;
        opt.camel2UnderlineComponentName = typeof opt.camel2UnderlineComponentName === 'undefined'
            ? false
            : opt.camel2UnderlineComponentName;
        opt.camel2DashComponentName = typeof opt.camel2DashComponentName === 'undefined'
            ? false
            : opt.camel2DashComponentName;

        if (!types.isImportDefaultSpecifier(specifiers[0]) && !types.isImportNamespaceSpecifier(specifiers[0])) {
            // 遍历specifiers生成转换后的ImportDeclaration节点数组
            const declarations = specifiers.map((specifier) => {
                // 转换组件名称
                    const transformedSourceName = opt.camel2UnderlineComponentName
                    ? camel2Underline(specifier.imported.name)
                    : opt.camel2DashComponentName
                        ? camel2Dash(specifier.imported.name)
                        : specifier.imported.name;
                // 利用自定义的customSourceFunc生成绝对路径，然后创建新的ImportDeclaration节点
                    return types.ImportDeclaration([types.ImportDefaultSpecifier(specifier.local)],
                    types.StringLiteral(opt.customSourceFunc(transformedSourceName)));
                });
                // 将当前节点替换成新建的ImportDeclaration节点组
            path.replaceWithMultiple(declarations);
        }
    }
}
复制代码

其中opts表示的就是之前在.babelrc.js或babel-loader中传入的配置参数，代码中的camel2UnderlineComponentName和camel2DashComponentName可以先不考虑，不过从字面上也能猜到是什么功能。这个visitor主要就是遍历模块内所有的ImportDeclaration节点，找出specifier为ImportSpecifier类型的节点，利用传入customSourceFunc得到其绝对路径的导入方式，然后替换原来的ImportDeclaration节点，这样就可以实现组件的按需加载了。

我们来测试一下效果，

const babel = require('babel-core');
const types = require('babel-types');

const plugin = require('./../lib/index.js');

const visitor = plugin({types});

const code = `
    import { Select as MySelect, Pagination } from 'xxx-ui';
    import * as UI from 'xxx-ui';
`;

const result = babel.transform(code, {
    plugins: [
        [
            visitor,
            {
                "libraryName": "xxx-ui",
                "camel2DashComponentName": true,
                "customSourceFunc": componentName =>（`./xxx-ui/src/components/ui-base/${componentName}/${componentName}`）}
            }
        ]
    ]
});

console.log(result.code);
// import MySelect from './xxx-ui/src/components/ui-base/select/select';
// import Pagination from './xxx-ui/src/components/ui-base/pagination/pagination';
// import * as UI from 'xxx-ui';

复制代码

这个Babel插件已发布到npm，插件地址：www.npmjs.com/package/bab…

有兴趣的也可以查看插件源码：github.com/hudingyu/ba… 源码里面有测试例子，可以自己clone下来跑跑看，记得先build一下。

其实这个插件算是乞丐版的按需加载插件，ant-design的按需加载插件babel-plugin-import实现了更完备的方案，也对React做了特殊优化

前端及前端编译原理 JS 写 JS 编译器

一、为什么要用JS写JS的解释器

接触过小程序开发的同学应该知道，小程序运行的环境禁止new Function，eval等方法的使用，导致我们无法直接执行字符串形式的动态代码。此外，许多平台也对这些JS自带的可执行动态代码的方法进行了限制，那么我们是没有任何办法了吗？既然如此，我们便可以用JS写一个解析器，让JS自己去运行自己。

在开始之前，我们先简单回顾一下编译原理的一些概念。

二、什么是编译器

说到编译原理，肯定离不开编译器。简单来说，当一段代码经过编译器的词法分析、语法分析等阶段之后，会生成一个树状结构的“抽象语法树（AST）”，该语法树的每一个节点都对应着代码当中不同含义的片段。

比如有这么一段代码：

const a = 1
console.log(a)

经过编译器处理后，它的AST长这样：

{
  "type": "Program",
  "start": 0,
  "end": 26,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 11,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 11,
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 7,
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "start": 10,
            "end": 11,
            "value": 1,
            "raw": "1"
          }
        }
      ],
      "kind": "const"
    },
    {
      "type": "ExpressionStatement",
      "start": 12,
      "end": 26,
      "expression": {
        "type": "CallExpression",
        "start": 12,
        "end": 26,
        "callee": {
          "type": "MemberExpression",
          "start": 12,
          "end": 23,
          "object": {
            "type": "Identifier",
            "start": 12,
            "end": 19,
            "name": "console"
          },
          "property": {
            "type": "Identifier",
            "start": 20,
            "end": 23,
            "name": "log"
          },
          "computed": false
        },
        "arguments": [
          {
            "type": "Identifier",
            "start": 24,
            "end": 25,
            "name": "a"
          }
        ]
      }
    }
  ],
  "sourceType": "module"
}

常见的JS编译器有babylon，acorn等等，感兴趣的同学可以在AST explorer这个网站自行体验。

可以看到，编译出来的AST详细记录了代码中所有语义代码的类型、起始位置等信息。这段代码除了根节点Program外，主体包含了两个节点VariableDeclaration和ExpressionStatement，而这些节点里面又包含了不同的子节点。

正是由于AST详细记录了代码的语义化信息，所以Babel，Webpack，Sass，Less等工具可以针对代码进行非常智能的处理。

三、什么是解释器

如同翻译人员不仅能看懂一门外语，也能对其艺术加工后把它翻译成母语一样，人们把能够将代码转化成AST的工具叫做“编译器”，而把能够将AST翻译成目标语言并运行的工具叫做“解释器”。

在编译原理的课程中，我们思考过这么一个问题：如何让计算机运行算数表达式1+2+3:

1 + 2 + 3

当机器执行的时候，它可能会是这样的机器码：

1 PUSH 1
2 PUSH 2
3 ADD
4 PUSH 3
5 ADD

而运行这段机器码的程序，就是解释器。

在这篇文章中，我们不会搞出机器码这样复杂的东西，仅仅是使用JS在其runtime环境下去解释JS代码的AST。由于解释器使用JS编写，所以我们可以大胆使用JS自身的语言特性，比如this绑定、new关键字等等，完全不需要对它们进行额外处理，也因此让JS解释器的实现变得非常简单。

在回顾了编译原理的基本概念之后，我们就可以着手进行开发了。

四、节点遍历器

通过分析上文的AST，可以看到每一个节点都会有一个类型属性type，不同类型的节点需要不同的处理方式，处理这些节点的程序，就是“节点处理器（nodeHandler）”

定义一个节点处理器：

const nodeHandler = {
  Program () {},
  VariableDeclaration () {},
  ExpressionStatement () {},
  MemberExpression () {},
  CallExpression () {},
  Identifier () {}
}

关于节点处理器的具体实现，会在后文进行详细探讨，这里暂时不作展开。

有了节点处理器，我们便需要去遍历AST当中的每一个节点，递归地调用节点处理器，直到完成对整棵语法书的处理。

定义一个节点遍历器（NodeIterator）：

class NodeIterator {
  constructor (node) {
    this.node = node
    this.nodeHandler = nodeHandler
  }

  traverse (node) {
    // 根据节点类型找到节点处理器当中对应的函数
    const _eval = this.nodeHandler[node.type]
    // 若找不到则报错
    if (!_eval) {
      throw new Error(`canjs: Unknown node type "${node.type}".`)
    }
    // 运行处理函数
    return _eval(node)
  }

}

理论上，节点遍历器这样设计就可以了，但仔细推敲，发现漏了一个很重要的东西——作用域处理。

回到节点处理器的VariableDeclaration()方法，它用来处理诸如const a = 1这样的变量声明节点。假设它的代码如下：

  VariableDeclaration (node) {
    for (const declaration of node.declarations) {
      const { name } = declaration.id
      const value = declaration.init ? traverse(declaration.init) : undefined
      // 问题来了，拿到了变量的名称和值，然后把它保存到哪里去呢？
      // ...
    }
  },

问题在于，处理完变量声明节点以后，理应把这个变量保存起来。按照JS语言特性，这个变量应该存放在一个作用域当中。在JS解析器的实现过程中，这个作用域可以被定义为一个scope对象。

改写节点遍历器，为其新增一个scope对象

class NodeIterator {
  constructor (node, scope = {}) {
    this.node = node
    this.scope = scope
    this.nodeHandler = nodeHandler
  }

  traverse (node, options = {}) {
    const scope = options.scope || this.scope
    const nodeIterator = new NodeIterator(node, scope)
    const _eval = this.nodeHandler[node.type]
    if (!_eval) {
      throw new Error(`canjs: Unknown node type "${node.type}".`)
    }
    return _eval(nodeIterator)
  }

  createScope (blockType = 'block') {
    return new Scope(blockType, this.scope)
  }
}

然后节点处理函数VariableDeclaration()就可以通过scope保存变量了：

  VariableDeclaration (nodeIterator) {
    const kind = nodeIterator.node.kind
    for (const declaration of nodeIterator.node.declarations) {
      const { name } = declaration.id
      const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined
      // 在作用域当中定义变量
      // 如果当前是块级作用域且变量用var定义，则定义到父级作用域
      if (nodeIterator.scope.type === 'block' && kind === 'var') {
        nodeIterator.scope.parentScope.declare(name, value, kind)
      } else {
        nodeIterator.scope.declare(name, value, kind)
      }
    }
  },

关于作用域的处理，可以说是整个JS解释器最难的部分。接下来我们将对作用域处理进行深入的剖析。

五、作用域处理

考虑到这样一种情况：

const a = 1
{
  const b = 2
  console.log(a)
}
console.log(b)

运行结果必然是能够打印出a的值，然后报错：Uncaught ReferenceError: b is not defined

这段代码就是涉及到了作用域的问题。块级作用域或者函数作用域可以读取其父级作用域当中的变量，反之则不行，所以对于作用域我们不能简单地定义一个空对象，而是要专门进行处理。

定义一个作用域基类Scope：

class Scope {
  constructor (type, parentScope) {
    // 作用域类型，区分函数作用域function和块级作用域block
    this.type = type
    // 父级作用域
    this.parentScope = parentScope
    // 全局作用域
    this.globalDeclaration = standardMap
    // 当前作用域的变量空间
    this.declaration = Object.create(null)
  }

  /*
   * get/set方法用于获取/设置当前作用域中对应name的变量值
     符合JS语法规则，优先从当前作用域去找，若找不到则到父级作用域去找，然后到全局作用域找。
     如果都没有，就报错
   */
  get (name) {
    if (this.declaration[name]) {
      return this.declaration[name]
    } else if (this.parentScope) {
      return this.parentScope.get(name)
    } else if (this.globalDeclaration[name]) {
      return this.globalDeclaration[name]
    }
    throw new ReferenceError(`${name} is not defined`)
  }

  set (name, value) {
    if (this.declaration[name]) {
      this.declaration[name] = value
    } else if (this.parentScope[name]) {
      this.parentScope.set(name, value)
    } else {
      throw new ReferenceError(`${name} is not defined`)
    }
  }

  /**
   * 根据变量的kind调用不同的变量定义方法
   */
  declare (name, value, kind = 'var') {
    if (kind === 'var') {
      return this.varDeclare(name, value)
    } else if (kind === 'let') {
      return this.letDeclare(name, value)
    } else if (kind === 'const') {
      return this.constDeclare(name, value)
    } else {
      throw new Error(`canjs: Invalid Variable Declaration Kind of "${kind}"`)
    }
  }

  varDeclare (name, value) {
    let scope = this
    // 若当前作用域存在非函数类型的父级作用域时，就把变量定义到父级作用域
    while (scope.parentScope && scope.type !== 'function') {
      scope = scope.parentScope
    }
    this.declaration[name] = new SimpleValue(value, 'var')
    return this.declaration[name]
  }

  letDeclare (name, value) {
    // 不允许重复定义
    if (this.declaration[name]) {
      throw new SyntaxError(`Identifier ${name} has already been declared`)
    }
    this.declaration[name] = new SimpleValue(value, 'let')
    return this.declaration[name]
  }

  constDeclare (name, value) {
    // 不允许重复定义
    if (this.declaration[name]) {
      throw new SyntaxError(`Identifier ${name} has already been declared`)
    }
    this.declaration[name] = new SimpleValue(value, 'const')
    return this.declaration[name]
  }
}

这里使用了一个叫做simpleValue()的函数来定义变量值，主要用于处理常量：

class SimpleValue {
  constructor (value, kind = '') {
    this.value = value
    this.kind = kind
  }

  set (value) {
    // 禁止重新对const类型变量赋值
    if (this.kind === 'const') {
      throw new TypeError('Assignment to constant variable')
    } else {
      this.value = value
    }
  }

  get () {
    return this.value
  }
}

处理作用域问题思路，关键的地方就是在于JS语言本身寻找变量的特性——优先当前作用域，父作用域次之，全局作用域最后。反过来，在节点处理函数VariableDeclaration()里，如果遇到块级作用域且关键字为var，则需要把这个变量也定义到父级作用域当中，这也就是我们常说的“全局变量污染”。

JS标准库注入

细心的读者会发现，在定义Scope基类的时候，其全局作用域globalScope被赋值了一个standardMap对象，这个对象就是JS标准库。

简单来说，JS标准库就是JS这门语言本身所带有的一系列方法和属性，如常用的setTimeout，console.log等等。为了让解析器也能够执行这些方法，所以我们需要为其注入标准库：

const standardMap = {
  console: new SimpleValue(console)
}

这样就相当于往解析器的全局作用域当中注入了console这个对象，也就可以直接被使用了。

六、节点处理器

在处理完节点遍历器、作用域处理的工作之后，便可以来编写节点处理器了。顾名思义，节点处理器是专门用来处理AST节点的，上文反复提及的VariableDeclaration()方法便是其中一个。下面将对部分关键的节点处理器进行讲解。

在开发节点处理器之前，需要用到一个工具，用于判断JS语句当中的return，break，continue关键字。

关键字判断工具`Signal`

定义一个Signal基类：

class Signal {
  constructor (type, value) {
    this.type = type
    this.value = value
  }

  static Return (value) {
    return new Signal('return', value)
  }

  static Break (label = null) {
    return new Signal('break', label)
  }

  static Continue (label) {
    return new Signal('continue', label)
  }

  static isReturn(signal) {
    return signal instanceof Signal && signal.type === 'return'
  }

  static isContinue(signal) {
    return signal instanceof Signal && signal.type === 'continue'
  }

  static isBreak(signal) {
    return signal instanceof Signal && signal.type === 'break'
  }

  static isSignal (signal) {
    return signal instanceof Signal
  }
}

有了它，就可以对语句当中的关键字进行判断处理，接下来会有大用处。

1、变量定义节点处理器——`VariableDeclaration()`

最常用的节点处理器之一，负责把变量注册到正确的作用域。

  VariableDeclaration (nodeIterator) {
    const kind = nodeIterator.node.kind
    for (const declaration of nodeIterator.node.declarations) {
      const { name } = declaration.id
      const value = declaration.init ? nodeIterator.traverse(declaration.init) : undefined
      // 在作用域当中定义变量
      // 若为块级作用域且关键字为var，则需要做全局污染
      if (nodeIterator.scope.type === 'block' && kind === 'var') {
        nodeIterator.scope.parentScope.declare(name, value, kind)
      } else {
        nodeIterator.scope.declare(name, value, kind)
      }
    }
  },

2、标识符节点处理器——`Identifier()`

专门用于从作用域中获取标识符的值。

  Identifier (nodeIterator) {
    if (nodeIterator.node.name === 'undefined') {
      return undefined
    }
    return nodeIterator.scope.get(nodeIterator.node.name).value
  },

3、字符节点处理器——`Literal()`

返回字符节点的值。

  Literal (nodeIterator) {
    return nodeIterator.node.value
  }

4、表达式调用节点处理器——`CallExpression()`

用于处理表达式调用节点的处理器，如处理func()，console.log()等。

  CallExpression (nodeIterator) {
    // 遍历callee获取函数体
    const func = nodeIterator.traverse(nodeIterator.node.callee)
    // 获取参数
    const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))

    let value
    if (nodeIterator.node.callee.type === 'MemberExpression') {
      value = nodeIterator.traverse(nodeIterator.node.callee.object)
    }
    // 返回函数运行结果
    return func.apply(value, args)
  },

5、表达式节点处理器——`MemberExpression()`

区分于上面的“表达式调用节点处理器”，表达式节点指的是person.say，console.log这种函数表达式。

  MemberExpression (nodeIterator) {
    // 获取对象，如console
    const obj = nodeIterator.traverse(nodeIterator.node.object)
    // 获取对象的方法，如log
    const name = nodeIterator.node.property.name
    // 返回表达式，如console.log
    return obj[name]
  }

6、块级声明节点处理器——`BlockStatement()`

非常常用的处理器，专门用于处理块级声明节点，如函数、循环、try...catch...当中的情景。

  BlockStatement (nodeIterator) {
    // 先定义一个块级作用域
    let scope = nodeIterator.createScope('block')

    // 处理块级节点内的每一个节点
    for (const node of nodeIterator.node.body) {
      if (node.type === 'VariableDeclaration' && node.kind === 'var') {
        for (const declaration of node.declarations) {
          scope.declare(declaration.id.name, declaration.init.value, node.kind)
        }
      } else if (node.type === 'FunctionDeclaration') {
        nodeIterator.traverse(node, { scope })
      }
    }

    // 提取关键字（return, break, continue）
    for (const node of nodeIterator.node.body) {
      if (node.type === 'FunctionDeclaration') {
        continue
      }
      const signal = nodeIterator.traverse(node, { scope })
      if (Signal.isSignal(signal)) {
        return signal
      }
    }
  }

可以看到这个处理器里面有两个for...of循环。第一个用于处理块级内语句，第二个专门用于识别关键字，如循环体内部的break，continue或者函数体内部的return。

7、函数定义节点处理器——`FunctionDeclaration()`

往作用当中声明一个和函数名相同的变量，值为所定义的函数：

  FunctionDeclaration (nodeIterator) {
    const fn = NodeHandler.FunctionExpression(nodeIterator)
    nodeIterator.scope.varDeclare(nodeIterator.node.id.name, fn)
    return fn    
  }

8、函数表达式节点处理器——`FunctionExpression()`

用于定义一个函数：

  FunctionExpression (nodeIterator) {
    const node = nodeIterator.node
    /**
     * 1、定义函数需要先为其定义一个函数作用域，且允许继承父级作用域
     * 2、注册`this`, `arguments`和形参到作用域的变量空间
     * 3、检查return关键字
     * 4、定义函数名和长度
     */
    const fn = function () {
      const scope = nodeIterator.createScope('function')
      scope.constDeclare('this', this)
      scope.constDeclare('arguments', arguments)

      node.params.forEach((param, index) => {
        const name = param.name
        scope.varDeclare(name, arguments[index])
      })

      const signal = nodeIterator.traverse(node.body, { scope })
      if (Signal.isReturn(signal)) {
        return signal.value
      }
    }

    Object.defineProperties(fn, {
      name: { value: node.id ? node.id.name : '' },
      length: { value: node.params.length }
    })

    return fn
  }

9、this表达式处理器——`ThisExpression()`

该处理器直接使用JS语言自身的特性，把this关键字从作用域中取出即可。

  ThisExpression (nodeIterator) {
    const value = nodeIterator.scope.get('this')
    return value ? value.value : null
  }

10、new表达式处理器——`NewExpression()`

和this表达式类似，也是直接沿用JS的语言特性，获取函数和参数之后，通过bind关键字生成一个构造函数，并返回。

  NewExpression (nodeIterator) {
    const func = nodeIterator.traverse(nodeIterator.node.callee)
    const args = nodeIterator.node.arguments.map(arg => nodeIterator.traverse(arg))
    return new (func.bind(null, ...args))
  }

11、For循环节点处理器——`ForStatement()`

For循环的三个参数对应着节点的init，test，update属性，对着三个属性分别调用节点处理器处理，并放回JS原生的for循环当中即可。

  ForStatement (nodeIterator) {
    const node = nodeIterator.node
    let scope = nodeIterator.scope
    if (node.init && node.init.type === 'VariableDeclaration' && node.init.kind !== 'var') {
      scope = nodeIterator.createScope('block')
    }

    for (
      node.init && nodeIterator.traverse(node.init, { scope });
      node.test ? nodeIterator.traverse(node.test, { scope }) : true;
      node.update && nodeIterator.traverse(node.update, { scope })
    ) {
      const signal = nodeIterator.traverse(node.body, { scope })

      if (Signal.isBreak(signal)) {
        break
      } else if (Signal.isContinue(signal)) {
        continue
      } else if (Signal.isReturn(signal)) {
        return signal
      }
    }
  }

同理，for...in，while和do...while循环也是类似的处理方式，这里不再赘述。

12、If声明节点处理器——`IfStatemtnt()`

处理If语句，包括if，if...else，if...elseif...else。

  IfStatement (nodeIterator) {
    if (nodeIterator.traverse(nodeIterator.node.test)) {
      return nodeIterator.traverse(nodeIterator.node.consequent)
    } else if (nodeIterator.node.alternate) {
      return nodeIterator.traverse(nodeIterator.node.alternate)
    }
  }

同理，switch语句、三目表达式也是类似的处理方式。

上面列出了几个比较重要的节点处理器，在es5当中还有很多节点需要处理，详细内容可以访问这个地址一探究竟。

七、定义调用方式

经过了上面的所有步骤，解析器已经具备处理es5代码的能力，接下来就是对这些散装的内容进行组装，最终定义一个方便用户调用的办法。

const { Parser } = require('acorn')
const NodeIterator = require('./iterator')
const Scope = require('./scope')

class Canjs {
  constructor (code = '', extraDeclaration = {}) {
    this.code = code
    this.extraDeclaration = extraDeclaration
    this.ast = Parser.parse(code)
    this.nodeIterator = null
    this.init()
  }

  init () {
    // 定义全局作用域，该作用域类型为函数作用域
    const globalScope = new Scope('function')
    // 根据入参定义标准库之外的全局变量
    Object.keys(this.extraDeclaration).forEach((key) => {
      globalScope.addDeclaration(key, this.extraDeclaration[key])
    })
    this.nodeIterator = new NodeIterator(null, globalScope)
  }

  run () {
    return this.nodeIterator.traverse(this.ast)
  }
}

这里我们定义了一个名为Canjs的基类，接受字符串形式的JS代码，同时可定义标准库之外的变量。当运行run()方法的时候就可以得到运行结果。

八、后续

至此，整个JS解析器已经完成，可以很好地运行ES5的代码（可能还有bug没有发现）。但是在当前的实现中，所有的运行结果都是放在一个类似沙盒的地方，无法对外界产生影响。如果要把运行结果取出来，可能的办法有两种。第一种是传入一个全局的变量，把影响作用在这个全局变量当中，借助它把结果带出来；另外一种则是让解析器支持export语法，能够把export语句声明的结果返回，感兴趣的读者可以自行研究。

最后，这个JS解析器已经在我的Github上开源，欢迎前来交流~

https://github.com/jrainlau/c...

CodingMeUp / AboutFE

30、ES特性、BABEL AST相关、编译原理解释器 #31

ES6 等等

深入Babel

前言

抽象语法树（AST）

Babel的工作流程

Visitor

Path

State

Scopes（作用域）

Babel的工具集

Babylon

babel-traverse

babel-types

Babel插件实践

前端及前端编译原理 JS 写 JS 编译器

一、为什么要用JS写JS的解释器

二、什么是编译器

三、什么是解释器

四、节点遍历器

五、作用域处理

JS标准库注入

六、节点处理器

关键字判断工具`Signal`

1、变量定义节点处理器——`VariableDeclaration()`

2、标识符节点处理器——`Identifier()`

3、字符节点处理器——`Literal()`

4、表达式调用节点处理器——`CallExpression()`

5、表达式节点处理器——`MemberExpression()`

6、块级声明节点处理器——`BlockStatement()`

7、函数定义节点处理器——`FunctionDeclaration()`

8、函数表达式节点处理器——`FunctionExpression()`

9、this表达式处理器——`ThisExpression()`

10、new表达式处理器——`NewExpression()`

11、For循环节点处理器——`ForStatement()`

12、If声明节点处理器——`IfStatemtnt()`

七、定义调用方式

八、后续

哪些东西Babel不做

CodingMeUp / AboutFE

30、ES特性、BABEL AST相关、编译原理解释器 #31

ES6 等等

深入Babel

前言

抽象语法树（AST）

Babel的工作流程

Visitor

Path

State

Scopes（作用域）

Babel的工具集

Babylon

babel-traverse

babel-types

Babel插件实践

前端及前端编译原理 JS 写 JS 编译器

一、为什么要用JS写JS的解释器

二、什么是编译器

三、什么是解释器

四、节点遍历器

五、作用域处理

JS标准库注入

六、节点处理器

关键字判断工具Signal

1、变量定义节点处理器——VariableDeclaration()

2、标识符节点处理器——Identifier()

3、字符节点处理器——Literal()

4、表达式调用节点处理器——CallExpression()

5、表达式节点处理器——MemberExpression()

6、块级声明节点处理器——BlockStatement()

7、函数定义节点处理器——FunctionDeclaration()

8、函数表达式节点处理器——FunctionExpression()

9、this表达式处理器——ThisExpression()

10、new表达式处理器——NewExpression()

11、For循环节点处理器——ForStatement()

12、If声明节点处理器——IfStatemtnt()

七、定义调用方式

八、后续

哪些东西Babel不做

关键字判断工具`Signal`

1、变量定义节点处理器——`VariableDeclaration()`

2、标识符节点处理器——`Identifier()`

3、字符节点处理器——`Literal()`

4、表达式调用节点处理器——`CallExpression()`

5、表达式节点处理器——`MemberExpression()`

6、块级声明节点处理器——`BlockStatement()`

7、函数定义节点处理器——`FunctionDeclaration()`

8、函数表达式节点处理器——`FunctionExpression()`

9、this表达式处理器——`ThisExpression()`

10、new表达式处理器——`NewExpression()`

11、For循环节点处理器——`ForStatement()`

12、If声明节点处理器——`IfStatemtnt()`