gjtorikian / html-pipeline

HTML processing filters and utilities
MIT License
2.26k stars 380 forks source link

Context & result of filters except for text_filters aren't overwritten on call time #402

Closed nekketsuuu closed 2 months ago

nekketsuuu commented 2 months ago

The doc of HTMLPipeline#call says that its context parameter is passed to each filter: https://github.com/gjtorikian/html-pipeline/blob/aec775ca9751379c252a0dd71f3b3c4c9362f4c2/lib/html_pipeline.rb#L137-L149

But in fact, the context is only passed to each text filters, and not passed to a convert filter and all node filters:

https://github.com/gjtorikian/html-pipeline/blob/aec775ca9751379c252a0dd71f3b3c4c9362f4c2/lib/html_pipeline.rb#L213

https://github.com/gjtorikian/html-pipeline/blob/aec775ca9751379c252a0dd71f3b3c4c9362f4c2/lib/html_pipeline.rb#L174

https://github.com/gjtorikian/html-pipeline/blob/aec775ca9751379c252a0dd71f3b3c4c9362f4c2/lib/html_pipeline.rb#L180

So we can pass context to a convert filter and node filters only when a pipeline is constructed by HTMLPipeline.new, and cannot pass it to them when the pipeline is called.

On html-pipeline v2, the context hash is actually passed into all filters: https://github.com/gjtorikian/html-pipeline/blob/84c75b3d123c5219238c5f01c2b7de7ea73755cd/lib/html/pipeline.rb#L118-L120

Is this an intended change? If so, how about:

  1. rewriting a document of HTMLPipeline#call to reflect the current behavior, or
  2. adding context and result parameters to call methods of convert filters and node filters?

For option 2, changing the interface is a bit difficult because for node filters we should modify @context of given node filters before passing the filters to Selma::Rewriter#new.

Environment: html-pipeline v3.1.1

gjtorikian commented 2 months ago

Hey, thanks for the thorough report. Just acking this and I'll look into it soon!

gjtorikian commented 2 months ago

Hi, I opened https://github.com/gjtorikian/html-pipeline/issues/402 to address this. In summary: no, this should not have been removed. Context should be able to be passed at call time to each filter, too. That way you set up one pipeline and it renders the same content in different ways, depending on the context hash at the time (or the global context you set up when first defining the pipeline).

In any case, take a look at that branch and let me know if that works for your use case.

nekketsuuu commented 2 months ago

Thank you!