elm-tooling / gsoc-projects

A list of possible gsoc projects
4 stars 4 forks source link

Student Proposal: Integrating performance improvements from the Elm compiler into elm-format #13

Open emmabastas opened 3 years ago

emmabastas commented 3 years ago

Integrating performance improvements from the Elm compiler into elm-format

Edit1: Added section Relevant parsec API Edit2: Updated section Benefits and Requirements to reflect some of the points of feedback Edit3: Updated Timeline after feedback. Edit4: Final edit before submitting proposal.

Name: Emma Bastås\ Name in Elm Slack: @emmabastas\ Email: emma.bastas@protonmail.com

Summary

Elm's de-facto standard source code formatter elm-format is based on Elm's 0.15 compiler parsing code. Since Elm 0.19 the compilers parsing code has been rewritten to eliminate the dependency on parsec and indents for parsing and to greatly improve performance. elm-format has however diverged from the compiler to the point where integrating this rewrite is nontrivial. The end goal of this project is to integrate the performance improvements from the 0.19 compiler into elm-format.

How will I achieve this

  1. Replace elm-format's dependency on parsec with an adapter layer that implements the relevant parsec API on top of the compilers new parser API
  2. Incrementally migrate elm-format to use the new parsing API directly instead of via the adapter layer. Does not need to be fully completed.
  3. Benchmark elm-format before and after change. There is already strong evidence from the compiler rewrite that there will be a performance improvement, and this project will have other benefits regardless, so benchmarking is not super important and can be considered optional.

What will the project focus on

Integrating the performance improvements from the Elm compiler into elm-format. We should be confident that no new bugs are introduced and the code should be relatively clean and maintainable.

Benefits

elm-format is an integral part of the experience of Elm and is basically used universally within the community. This widespread adoption has a major benefit; how Elm code should be formatted becomes a none-issue. To ensure that elm-format remains this widely used it has to be strictly better than not using a formatter and performance is a part of that, no one should ever have to consider not using elm-format out of performance concerns. This project will help with that. Another benefit is that having elm-format and the compiler share as much of the codebase as possible (within reason) is good for maintainability. Future changes to the compilers parsing logic will be easier to integrate.

Timeline

Week 20-22 - Community bonding

Get to know mentor and other relevant community members. Figure out how we'd like to do our meetings (I would prefer regular meetings at fixed dates & times), how often, how long, what to discuss etc, what do we expect of one another? Refine goals, focuses and timeline. Discuss procedures for submitting code, testing, style etc. Most importantly: have a good time, build trust and lay the foundation for a healthy, frustration free mentor-student relationship.

Week 23 & 24

During week 23 my university does it's semester evaluations, which usually makes this a very intense week. Therefore I have planned to not do as much work on GSOC for this week. Any time lost will of course be compensated for during the subsequent weeks.

Get familiar with how elm-format and the Elm compiler does it's parsing. Develop an intuitive understanding of parsec/indents (Text.Parsec.Prim and Text.Parsec.Indent strike me as the most important modules) and the Elm compilers Primitives.hs, and how they differ, write it down in a document. Look at the relevant parsec API in more detail, are there declarations from parsec that look like they could be particularity difficult to implement? Detail that in a document.

After these two weeks I have produced a document detailing the key differences between parsec and Primitives.hs, and important/problematic parsec functions. I and the mentor agree on a rough order and manner in which to implement the wrapper layer. I assign subgoals to the week 25-28 block. I have written a sort of skeleton for the wrapper module(s) with all the declarations needed for elm-format to compile without parsec and indents in place, all of the functions bodies being error "todo".

Week 25-28

Implement the adapter layer for the compilers parsing code i.e replace all the error "todo" bodies with a suitable implementation. All tests are passing and we are confident that elm-format behaves the same as before. This block will have been split further during week 24, when more details about the work that will need to be done is known.

Week 29 & 30

Incrementally migrate elm-format to use the new parsing API directly instead of via the adapter layer. This migration doesn't necessarily have to be completed fully.

Week 31

This weeks is a buffer for schedule overruns. If there are no overruns then this week can be used for various things, refactoring code, perform a simple benchmark before and after this project, or continue with the work done during weeks 29 & 30.

Week X

Reserved for a potential vacation, date not yet decided. During this week I would not be able to do any work or be contacted (no internet). Planning to have a date set before the bonding period starts. The hours of work lost on this week would be compensated for during all of the other weeks.

Goals

Requirements

GHC, Cabal and other Haskell related programs.

Relevant parsec API

Here's a list of all of the declarations from parsec and indents that elm-format makes use of. Instance declarations not included.

Text.Parsec.Pos

Text.Parsec.Error

Text.Parsec.Prim

Text.Parsec.Combinator

Text.Parsec.Char

Text.Parsec.Indent

razzeee commented 3 years ago

@avh4

avh4 commented 3 years ago

@emmabastas looks good! Here are a few comments to consider at your discretion:

Summary: In addition to the primary benefit that you accurately described, in my mind, there is a secondary benefit of having elm-format's parser be more similar to elm-compiler's parser, which likely will make updating elm-format for future versions of Elm (which may possibly have syntax changes) easier because the parser will be easier to compare if they are more similar. Up to you as to whether you think that's worth including.

Timeline -> Week 23: The familiarization step I often find is the hardest to stay directed. To make sure there are some concrete goals, I think what you included about "Some initials stabs at writing the adapter layer." is quite important. If the goal of the week still seems too vague, a concrete way to measure familiarization could be something along the lines of learning enough to be able to describe to me how the parsing works, and/or writing out a list of of elm-format's and elm-compiler's parsers differ.

Benchmark work (Week 28 & 29): The benchmarking isn't super important to me, though it is something that is often briefly of general interest to folks in the community. We also already have indirect evidence that this change will improve performance due to seeing the benchmark results of elm-compiler. And even if this change did not improve performance, it is still worth doing due to the secondary benefit noted in my comments above about the "Summary" section. If you have a particular interest in learning about benchmarking and/or doing benchmarking in Haskell, then I think this is great to include (I'm also interested to learn about how to do this), but if you don't have a particular interest in benchmarking, then imo this could be skipped and instead reserve more time for "migrate the parsers from the adapter layer to use the new parsing API directly".

"Renting remote server to run benchmarks on": I think this probably won't be needed, unless there's some new benchmarking best practice that I'm not aware of. Most likely, comparing the percent change will be the best way to compare results, and that should be reliable enough on your local computer (or if we want more data points, we could recruit a few folks from the Elm community to try it on their own computers where they normally run elm-format).

parsec: One additional note that elm-format uses parts of both the core Parsec API, and also the Parsec indents package. So understanding that indents package and how elm-compiler's new parser handles the 3-4 parts of Elm's syntax that are indentation-sensitive will also be necessary, which is a little bit beyond what is typical use of the core parsec API.

avh4 commented 3 years ago

Reminder to submit the proposal at https://summerofcode.withgoogle.com