aw-studio / laravel-indexer

0 stars 0 forks source link

Add customizable HtmlLoader class #9

Closed jannescb closed 3 years ago

jannescb commented 3 years ago

This PR enables using a custom method for parsing the html of a URL. This might be useful for client-side-rendered pages.

In the config/indexer.php the url_parser may be changed.

This should be a fairly simple class with a getHtml method.

You could for example use the Spatie package Browsershot for parsing URL:

use AwStudio\Indexer\Contracts\UrlParser;
use Spatie\Browsershot\Browsershot;

class YourUrlParser implements UrlParser
{
    public function getHtml(string $url): string
    {
        return Browsershot::url($url)
                ->setDelay(500)
                ->bodyHtml();
    }
}

With this feature we could solve this issue.

cbl commented 3 years ago

The name parser does not match the intention of the interface. A parser returns the modified value of a parameter. The name should be something like HtmlLoader. Also, the class name of the implentation should give an idea of what the implementation looks like. A good name for the implemented loader would be FileContentHtmlLoader. So the interface could look like this:

interface HtmlLoader
{
    /**
     * Load the html content from the given url.
     *
     * @param string $url
     * @return string
     */
    public function load($url);
}

And the implementation:

class FileContentHtmlLoader implements HtmlLoader
{
    // ...
}

The implementation from the pr description:

class BrowsershotHtmlLoader implements HtmlLoader
{
    // ...
}