Awesome binary parsing
A list of generic tools for parsing binary data structures, such as
file formats, network protocols or bitstreams.
Parser generators, parsing libraries and frameworks
- Kaitai Struct (DSL):
declarative language used for describe various binary data structures,
laid out in files or in memory
- Nom (Rust): Rust parser combinator framework
- Hammer (C):
bit-oriented parsing library
- Construct (Python):
library for parsing
and building of data structures (binary or textual). Define your
data structures in a declarative manner
- Spicy (DSL, C/C++, Zeek):
a next-generation parser generator for network protocols and file formats
- Hachoir (Python): view and
edit a binary stream field by field.
Long list of parsers for all kinds of formats
- Caterpillar: Python 3.12+ library to pack and unpack structurized binary data
- RecordFlux: toolset for the formal specification of messages and the generation of verifiable binary parsers and message generators (Ada-inspired).
- DataScript Tools (DSL):
DataScript is a formal language for modelling binary datatypes,
bitstreams or file formats.
PDF
- Parsifal (OCaml):
OCaml-based parsing engine.
Paper:
A pragmatic solution to the binary parsing problem. Olivier Levillain
- Haka (Lua):
open source security oriented language which allows to describe protocols
and apply security policies on (live) captured traffic
- BinData (Ruby):
provides a declarative way to read and write structured binary data
- Binary-parser (JavaScript):
binary parser builder library which enables you to write
efficient parsers in a simple & declarative way
- Gloss (Clojure):
turn complicated byte formats into Clojure data structures and
Clojure data structures into compact byte representations
- Preon (Java):
Bit syntax for Java. A declarative data binding framework for dealing with binary encoded data
- attoparsec and attoparsec-binary: (Haskell):
fast parser combinator library, aimed particularly at dealing efficiently
with network protocols and complicated text/binary file formats
- Marpa (C/C++, Perl, Go):
libmarpa (C)
- Scapy (Python): send, sniff and dissect
and forge network packets. Usable interactively or as a library
- libtins (C++):
crafting, sending, sniffing and interpreting raw network packets
- libcrafter (C++):
high level library for C++ designed to create and decode network packets
- scodec (Scala):
Combinator library for working with binary data
- Apache Daffodil (Scala/Java, XML Schema):
an open-source implementation of
DFDL (Data Format Description Language)
capable of describing many industry and military standards and
parsing into a infoset, which is most commonly represented as either XML or JSON, and writing back to native format.
- binarylang (Nim, DSL):
extensible Nim DSL for creating binary parsers/encoders in a symmetric fashion
- binaryparse (Nim, DSL):
In-language DSL for reading and writing binary data supporting all sorts of
patterns. Generates an efficient stream based reader and writer for the
runtime execution.
- FlexT - a DSL and a tool for generating parsers in Delphi.
- FormatFuzzer (C++): framework for high-efficiency, high-quality generation and parsing of binary inputs
- Deku (Rust): bit-level, symmetric, serialization/deserialization implementations for structs and enums
- restruct (Go): library for reading and writing binary data
- Mr. Crowbar (Python):
Django-esque model framework for reading and writing binary file formats.
Includes a suite of command-line tools for visualising and digging through binary data.
- jBinary (JavaScript) High-level API for working with binary data.
- Wuffs: a memory-safe programming language (and a standard library written in that language) for Wrangling Untrusted File Formats Safely.
Wrangling includes parsing, decoding and encoding.
- EverParse: a framework for generating verified secure parsers and formatters from domain-specific format specification languages
- binrw (Rust): binrw helps you write maintainable & easy-to-read declarative binary data readers and writers using ✨macro magic✨.
- Dogma (DSL): human-friendly metalanguage for describing data formats in documentation using the familiar patterns of Backus-Naur Form.
Stand-alone software
Hex editors with grammars
- Synalyze It!
- Hexinator
- 010 Editor
- Kiewtai: plugin for the Hiew hex editor that makes the Kaitai parsers available
- Hobbits: multi-platform GUI for bit-based analysis, processing, and visualization. Has a Kaitai plugin.
- ImHex: A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.
- fq: jq for binary formats - tool, language and decoders for working with binary and text formats.
Wireshark
Wireshark is a network protocol analyzer
that includes dissectors
for over two thousand protocols.
- TShark:
command line version, can easily be called from shell scripts.
- Wireshark Generic Dissector:
add-on, allows dissection of a protocol based on a text description of the protocol elements
- Wireshark Lua:
dissectors can be written in Lua (Examples)
- pyreshark:
plugin providing a simple interface for writing Wireshark dissectors in Python
- Sharktools (Python, Matlab):
Tools for programmatic parsing of packet captures using Wireshark functionality
Other Stand-alone Software
- GNU poke: The extensible editor for structured binary data
- Netzob: open source tool for reverse engineering,
traffic generation and fuzzing of communication protocols
- Cat Karat Packet Builder:
packet generation tool that allows to build custom packets for firewall or target testing
- radare2 (C, with bindings/pipe for almost all languages):
Unix-like reverse engineering framework and commandline tools.
See Parsing a fileformat with radare2
and Types.
- Veles: open source tool for binary analysis
Research papers
- LangSec Platform: Towards a Platform to Compare Binary Parser Generators.
Olivier Levillain, Sébastien Naud, Aina Toky Rasoamanana (Video)
- Interval Parsing Grammars for File Format Parsing Jialun Zhang, Greg Morrisett, Gang Tan
- Narcissus:
Correct-By-Construction Derivation of Decoders and Encoders from Binary Formats.
Benjamin Delaware, Sorawit Suriyakarn, Clément Pit-Claudel, Qianchuan Ye, Adam Chlipala
- EverParse:
Verified Secure Zero-Copy Parsers for Authenticated Message Formats. Tahina Ramananandro et. al.
- Nail:
A Practical Tool for Parsing and Generating Data Formats.
Julian Bangert and Nickolai Zeldovich
- Generic packet descriptions:
Verified parsing and pretty printing of low-level data.
Marcell van Geest, Wouter Swierstra
- GAPA: Generic Application-Level Protocol Analyzer and its Language.
Nikita Borisov, David J. Brumley, Helen J. Wang, Chuanxiong Guo
- PADS/ML: a functional data description language.
Y. Mandelbaum, K. Fisher, D. Walker, M. F. Fernandez, and A. Gleyzer.
- PacketTypes: P. J. McCann and S. Chandra.
Packet types: Abstract specification of network protocol messages.
- Zebu:
A Language-Based Approach for Improving the Robustness of Network
Application Protocol Implementations. Larent Burgy et. al.
- Zebra:
Improving the Performance of Message Parsers for Embedded Systems.
Jigar Solanki et. al.
- z2z:
Automatic Generation of Network Protocol Gateways.
Yerom-David Bromberg, Laurent Reveillere, Julia L. Lawall, Gilles Muller
- Yakker:
Semantics and Algorithms for Data-dependent Grammars.
Trevor Jim, Yitzhak Mandelbaum, David Walker
- BinPAC:
Superseded by BinPAC++, which is now known as Spicy
- FlowSifter:
High-Speed Application Protocol Parsing and Extraction for Deep Flow Inspection.
Alex X. Liu, Chad R. Meiners, Eric Norige, and Eric Torng
- TSN.1:
Transfer Syntax Notation One (TSN.1).
A formal notation for describing messages in binary protocols
- NetPDL:
Markup Language that aims to describe Protocols from OSI layer 2 to OSI layer 7
- Tupni:
Automatic Reverse Engineering of Input Formats. Weidong Cui et. al.
- W. Underwood
Grammar-Based Specification and Parsing of Binary File Formats.
William Underwood
Lists of interesting binary formats
This is obviously rather subjective and definitely not supposed to be a complete list:
Related topics
- Compilers
- Domain Specific Languages
- Digital Forensics, Network Forensics:
file format identification,
dshell
- Firmware analysis and file carving:
Binwalk, Unblob, QuickBMS, OFRAK
- Deep Packet Inspection
- Packet Crafting:
hping2/3, tcpreplay, netdude, bittwist, netsniff-ng, Trafgen, ...
- Reverse Engineering
- Fuzzing: Sulley, Peach, ...
- Language-theoretic Security (LANGSEC)
- Chomsky hierarchy