JakeWheat / intro_to_parsing

Introduction to parsing with Haskell and Parsec
http://jakewheat.github.io/intro_to_parsing/
BSD 3-Clause "New" or "Revised" License
503 stars 35 forks source link

= Intro to Parsing with Parsec in Haskell

== Overview

WIP, a tutorial which demonstrates the basics of Parsec and goes on to build a SQL query parser.

You can view this tutorial as HTML online here:

http://jakewheat.github.io/intro_to_parsing/

and you can view the files directly in the github repository here:

https://github.com/JakeWheat/intro_to_parsing

=== Summary of sections

// the first link for each section will work in the readme on github, // the second link is for the rendered html and doesn't work here

link:GettingStarted.lhs[]

// <<getting-started, Getting Started>>

Introduction to parsing with Parsec, including a review of Text.Parsec.Char functions.

link:VerySimpleExpressions.lhs[]

// <<very-simple-expression-parsing, Very simple expression parsing>>

Creating a very simple expression language parser, and introducing some functions from Text.Parsec.Combinator.

link:ApplicativeStyle.lhs[]

// <<applicative-style-parsing-code, Applicative style parsing code>>

Rewriting the simple expression parser code in a more succinct style.

link:CombinatorReview.lhs[]

// <<combinator-review,Combinator review>>

Review and examples of all functions from Text.Parsec.Combinator, and some from Control.Applicative and Control.Monad.

link:FunctionsAndTypesForParsing.lhs[]

// <<functions-and-types-for-parsing,Functions and types for parsing>>

The utility functions used in the previous tutorials, plus some notes on types in Parsec.

link:TextParsecExpr.lhs[]

// <<parsing-expressions-with-fixity, Parsing expressions with fixity>>

This covers using the Text.Parsec.Expr for expression parsing with prefix, postfix and infix operators with fixity.

link:AnIssueWithTokenParsers.lhs[]

// <<an-issue-with-token-parsers, An issue with token parsers>>

Looks at an issue we have with the way the symbol parser in the Text.Parsec.Expr tutorial was used, and some possible fixes.

link:TextParsecPerm.lhs[]

// <<permutation-parsing, Permutation parsing>>

This covers the Text.Parsec.Perm module which is used for parsing different things in flexible order.

link:TextParsecToken.lhs[]

// <<token-parsing, Token parsing>>

This covers Text.Parsec.Token which can be used to create token parsers easily.

link:ValueExpressions.lhs[]

// <<value-expressions, Value expressions>>

This covers building a parser a subset of value expressions from SQL, which are an extension of the simple expression types and parsers covered in previous tutorials.

link:QueryExpressions.lhs[]

// <<query-expressions, Query expressions>>

This covers building a parser to parse query expressions with select lists, simple from, where, group by, having and order by.

link:FromClause.lhs[]

// <<from-clause, From clause>>

This extend the parser for query expressions to support a from clause with much more features including joins.

link:SimpleSQLQueryParser0.lhs[]

// <<simple-sql-query-parser, Simple SQL query parser>>

Here is the code from ValueExpressions, QueryExpressions and FromClause plus tests put together and rearranged as a coherent standalone module.

link:PrettyPrinting0.lhs[]

// <<pretty-printing, Pretty printing>>

This quick module covers a simple pretty printer for our SQL ast.

link:ErrorMessages.lhs[]

// <<error-messages, Error messages>>

In this document, we will explore error messages with parsec and how restructuring parser code can lead to better or worse error messages.

=== Going further

If you are interested in SQL parsing, check out the project to build a complete SQL parser here: http://jakewheat.github.io/simple-sql-parser/latest. The parsing code in the simple-sql-parser project is based on this tutorial code.

////

Later documents

Additional provisional documents not yet started:

Parsing TPC-H queries

We will use the tpch queries as examples to help improve the pretty printer. First there are a few extra bits of syntax to be able to parse these queries

Pretty printing part 2

some tweaks to the pretty printer to improve the layout for the tpch queries

Writing tests

Here we will take the ad hoc tests and build an organised test suite with a wrapper for hunit, wrapper for test.framework wrapper and maybe tasty

Refactored project + cabal package

In this tutorial, we will take the sql parser, pretty printer and tests, and create a complete cabal package.

TODO: talk about robustness and the casual way the parser has been put together and the casual way issues have been tackled.

Writing a command line sql interface

quick experiment to try to implement the front end for a multiline sql command line using fake incremental parsing which parsec doesn't support directly.

Position annotation

In this tutorial, we will add position annotation to the parsing, so that a later stage could, e.g., provide type error messages with the correct line and column numbers.

Dialects

In this tutorial, we will discuss how we can support other SQL dialects

Separate lexer

In this tutorial, we will look at creating a proper separate lexer to see how it is done, and remark on what the tradeoffs seem to be.

Fixing fixity

Parsing full SQL expressions is a mess, and trying to do the fixity at parse time has many downsides. Here is another approach, to ignore fixity at parse time and fix it in a pass on the ast after parsing.

Quasiquotes

In this tutorial, we will create quasiquoters for sql query expressions and value expressions, and see how powerful this can be

Something about syntax highlighting, generating documentation + links?

////

=== Extras

an executable which contains the boilerplate to run a parsec parser on a string passed as an argument

an executable which contains the boilerplate to run a parsec parser on a file passed as an argument

Contact: +++jakewheatmail@gmail.com+++

License: BSD3