facebook / flow

Adds static typing to JavaScript to improve developer productivity and code quality.
https://flow.org/
MIT License
22.09k stars 1.86k forks source link

flow-remove-types produces invalid JS after non-latin characters #7779

Open mourner opened 5 years ago

mourner commented 5 years ago

Flow version: 0.100.0

Expected behavior

Given the following input:

// @flow
// п
function foo(bar: number) {}

flow-remove-types should produce:

//      
// п
function foo(bar        ) {}

Actual behavior

It produces invalid JS:

//      
// п
function foo(bar:         {}

The non-latin character is the trigger. Without it, it works correctly. This seems like quite a serious bug considering how common it is to use non-latin characters in comments (e.g. comment in other languages, draw arrows etc.).

cc @mroch @samwgoldman

mourner commented 5 years ago

This seems to be a bug in flow-parser which produces incorrect range values in AST for non-latin characters. E.g. parsing // п and // n produces [0, 5] and [0, 4] ranges respectively, although both should be the same.

mroch commented 5 years ago

same root cause as #5793, I believe. the lexer reports its position in unicode codepoints, of which these are 1, but they are multibyte.

mourner commented 5 years ago

@mroch I believe it's a recent regression, because https://astexplorer.net/ (which uses Flow v0.94.0) shows identical AST for both cases ([0, 4]).

mroch commented 5 years ago
Screen Shot 2019-06-03 at 9 27 32 AM

this is 0.94, when hovering over the FunctionDeclaration in the AST. so it's been there a while, but flow-remove-types regressed because it moved to flow-parser.

mroch commented 5 years ago

this appears to only be an issue with the range. the loc matches babylon.

it's a bit easier to see if you do it all on one line: https://flow.org/try/#0PQKk-CAsBmD2MNxA (see the AST tab). since it's all on line 1, the range should match the columns in loc.

mourner commented 5 years ago

@mroch ah, yeah, you're right. The link I commented earlier triggers the bug if you add a new line after the comment.

mroch commented 5 years ago

hrm i think we intentionally made range work like this. it's the byte offset in the file, which is what you'd want if you wanted to seek to read that part of the file from disk. but it's not what you'd pass to substring in JS, which is what tools like flow-remove-types do.

mroch commented 5 years ago

cc @nmote I think we should change range to match esprima and babylon, and potentially add (an option for) another field like byte_range or something. will need to figure out which tools need to be updated to use that.

mroch commented 5 years ago

https://github.com/facebook/flow/blob/a05cbb202197d57e5e909e11dcbb859f11c48641/src/parser/offset_utils.mli#L8-L30

... yup

mourner commented 5 years ago

Thanks for looking into this @mroch! Agreed about bringing Flow AST more in line with other parsers.

In the meantime, to fix the breaking bug with flow-remove-types sooner, should we just rewrite the logic so that it only depends on line/column, not range? I could take a stab at a PR.

mroch commented 5 years ago

alternatively, if it could work with UTF8 bytes instead of JS strings, that would make the existing ranges correct.

for example, perhaps changing flow-remove-types to use Buffers? I think the offsets of a Buffer are in bytes, not codepoints. modifying a buffer in place instead of lots of string concatenation could be a perf win too. https://github.com/facebook/flow/blob/a05cbb202197d57e5e909e11dcbb859f11c48641/packages/flow-remove-types/index.js#L107

mourner commented 5 years ago

@mroch yep, will do!

walkerdb commented 5 years ago

We are running into the same problem, in our case with a ™ in a comment.

// @flow
import React from 'react';

/*
  This should Just Work™
*/
export const SomeComponent = (props: Object) => (
  <div {...props}>
    This breaks
  </div>
);

becomes

import React from 'react';

/*
  This should Just Work™
*/
export const SomeComponent = (props:=> (
  <div {...props}>
    This breaks
  </div>
);

Looks like it's been a few months now, and it shows up in all v2 releases of flow-remove-types -- any updates on a fix?