This proposal suggests introducing an option to set the MaxBuf parameter in the html.Parse function to control memory usage when parsing large HTML documents.
Background
Currently, html.Parse in the Go standard library calls ParseWithOptions internally, leading to a chain of function calls: html.Parse -> ParseWithOptions -> p.parse() -> p.tokenizer.Next() -> readByte(). Within readByte(), there is a logic block:
This logic is activated only if maxBuf is set. However, there is no way to set MaxBuf when using html.Parse or ParseWithOptions.
Problem
When parsing very large HTML documents, such as this page, memory usage can increase significantly due to the inability to set MaxBuf.
Solution
To address this, I propose introducing a function similar to ParseOptionEnableScripting to allow users to set MaxBuf.
Implementation
A sample implementation using reflection is provided below. This implementation, though functional, uses unsafe methods and reflection, which are not ideal for production code:
Testing has shown that setting maxBuf to at least 1.04 times the body length ensures normal operation.
Feasibility
Adding a function similar to ParseOptionEnableScripting to allow users to set MaxBuf would provide a safe and efficient way to control memory usage when parsing large HTML documents, avoiding the use of unsafe methods and reflection.
Proposal Details
Abstract
This proposal suggests introducing an option to set the
MaxBuf
parameter in thehtml.Parse
function to control memory usage when parsing large HTML documents.Background
Currently,
html.Parse
in the Go standard library callsParseWithOptions
internally, leading to a chain of function calls:html.Parse -> ParseWithOptions -> p.parse() -> p.tokenizer.Next() -> readByte()
. WithinreadByte()
, there is a logic block:This logic is activated only if
maxBuf
is set. However, there is no way to setMaxBuf
when usinghtml.Parse
orParseWithOptions
.Problem
When parsing very large HTML documents, such as this page, memory usage can increase significantly due to the inability to set
MaxBuf
.Solution
To address this, I propose introducing a function similar to
ParseOptionEnableScripting
to allow users to setMaxBuf
.Implementation
A sample implementation using reflection is provided below. This implementation, though functional, uses unsafe methods and reflection, which are not ideal for production code:
This implementation can be used as follows:
To properly address the issue, I propose the following function to be added to the standard library:
Testing has shown that setting
maxBuf
to at least 1.04 times the body length ensures normal operation.Feasibility
Adding a function similar to
ParseOptionEnableScripting
to allow users to setMaxBuf
would provide a safe and efficient way to control memory usage when parsing large HTML documents, avoiding the use of unsafe methods and reflection.Environment