antchfx / htmlquery

htmlquery is golang XPath package for HTML query.
https://github.com/antchfx/xpath
MIT License
723 stars 73 forks source link

Use an interface for LRU cache #69

Open JWAlberty opened 8 months ago

JWAlberty commented 8 months ago

We're leveraging xpath directly to validate our xquery as well as using htmlquery to do our actual xpath selection. We're using a LRU cache for our direct xpath usage but this means we have one cache for validation and another for htmlquery.

This PR exposes that cache as an interface which would allow us to not only share the cache but provide alternative cache implementations as well. Performance and behavior remains the same as the current implementation but now the cache is testable, not only that but we can provide null cache operations to improve testability and pre-primed caches.

This small change makes htmlquery a lot more testable and transparent.

zhengchun commented 8 months ago

Thanks for your PR.

I have another solution that without change any code.

First, disable the htmlquery's caching via DisableSelectorCache = true

Next, using Expr.Select()(https://pkg.go.dev/github.com/antchfx/xpath#Expr.Select) instead of htmlquery.Query(), this method you can continue using your LRU cache and caching Expr for the next use.

if exp, ok := cache.Get(key); !ok {
  exp, _ = xpath.Compile("selector)
  cache.Add(key,exp)
}
iter := exp.Select(htmlquery_doc)
while iter.MoveNext(){
  // put into the list
}
return list

What do you think?