getsentry / rust-sourcemap

A library for rust that implements basic sourcemap handling
Other
224 stars 28 forks source link

feat: Add support for range mappings proposal #77

Closed kdy1 closed 8 months ago

kdy1 commented 10 months ago

See https://github.com/tc39/source-map-rfc/blob/main/proposals/range-mappings.md

loewenheim commented 10 months ago

Hi, thank you for the contribution!

I think there is a fundamental misunderstanding here: it seems to me that the both decode_rmi function and the loop in decode_regular in which it is used assume that there is at most one range mapping per line. This is the only way I can see

            if rmi != 0 && (line_index as u32) == rmi - 1 {
                range_tokens.push(tokens.len() as u32)
            }

making sense—if the current index within the line is the number of the range token, we push the current token index to the list of range tokens.

But I think the proposal is meant to work differently: there can be many range mappings per line, and the ith bit in the rangeMappings for that line (modulo bit/byte endianness, I actually find this slightly confusing) tells you whether the ith mapping is one of them.

Consider this example: currently, decode_rmi("AAB") == 13. This is correct insofar as "AAB" makes the 13th mapping a range mapping, but the function shouldn't return 13, it should return the number with only the 13th bit set ¹. The current implementation clearly doesn't generalize to more than one range mapping per line.

Correctness aside, I'm really excited about this. These range mappings would've saved me a lot of pain in the past.

¹ Or better yet, as @Swatinem says, a bit vector.

loewenheim commented 10 months ago

I believe converting between 0- and 1-based indices causes some unnecessary confusion here.

  1. In decoder.rs line 219, you push a 0-based index to range_tokens.
  2. In SourceMap::get_token you binary search range_tokens for the token's index, which is 0-based.
  3. In test_decode_rmi, you convert to 1-based indices, I assume for purposes of clarity?
  4. In encoder.rs line 81, you add 1 to the index to make it 1-based.
  5. In encoder.rs lines 43-45, you assert that the indices are all greater than 0 (i.e. 1-based), but then subtract 1 to make them 0-based.

I would say remove the 1-based indexing in (4) and (5) (and arguably (3) as well) and just assume 0-based indices everywhere.

loewenheim commented 9 months ago

Just a heads-up—I haven't forgotten about this, I just haven't had the time to look at it.

kdy1 commented 8 months ago

@Swatinem I added some tests for lookup_token

Swatinem commented 8 months ago

The test now makes it a bit clearer what this proposal is trying to achieve.

So if I understand it correctly, new bitvec mapping adds this is_range flag. And the is_range flag determines if the mapping is using the "offset" from the initial binary search lookup to then adjust the output line/column numbers.

With a more visual example:

   /* foo */ if (some_other_code) { return; }
//           ^ - original token
// v - output token
   if (some_other_code) { return; }
//     ^ - lookup position

So the offset between lookup position and output token is added to original token, so that you end up exactly at the same position in relative code if it is merely shifted a bit by the sourcemap?

kdy1 commented 8 months ago

Yeap, exactly. This will be useful for the minifier, which does not have full input source locations.

This is the case because no tools emit a perfect source map while transpiling, and the mappings in inputSourceMap is not enough. With this RFC, it will have much more information about the input.