FabianBeiner / PHP-IMDB-Grabber

This PHP library enables you to scrape data from IMDB.com.
MIT License
271 stars 160 forks source link

Feature request: Music #155

Closed PrinceOfAbyss closed 3 years ago

PrinceOfAbyss commented 3 years ago

There are cases, like in Musicals, where a special section is available... The section is titled Music by and it would be a great addition to scrape that as well. For the movie Dear Evan Hansen (tt9357050), for example, that section can be seen below:

Screenshot 2021-06-04 005130

The HTML markup that produces that section for the movie of the example above is:

    <header class="ipl-header">
        <div class="ipl-header__content">        <h4 name="composers" id="composers" class="ipl-header__content ipl-list-title">
            Music by
        </h4>
</div>
        <a class="ipl-header__edit-link" href="https://contribute.imdb.com/updates?update=tt9357050:composers&ref_=tr_com">Edit</a>
    </header>

    <table class="simpleTable spFirst crew_list">
        <tbody>
                    <tr>
                        <td class="name">
                            <a href="/name/nm2537947/?ref_=tt_rv"
>Benj Pasek</a>
                        </td>
                            <td colspan=2></td>
                    </tr>
                    <tr>
                        <td class="name">
                            <a href="/name/nm2524192/?ref_=tt_rv"
>Justin Paul</a>
                        </td>
                            <td colspan=2></td>
                    </tr>
        </tbody>
    </table>

In fact I have implemented it for you, as a Thank you for this great tool. So below I'm including the code to add to your class.

In the constants add this:

    const IMDB_MUSIC         = '~Music by\s*<\/h4>.*<table class=.*>(.*)</table>~Us';

Then, in the methods add these two:

    /**
     * @return string A list with the music composers or $sNotFound.
     */
    public function getMusic()
    {
        if (true === $this->isReady) {
            $sMatch = $this->getMusicAsUrl();
            if (self::$sNotFound !== $sMatch) {
                return IMDBHelper::cleanString($sMatch);
            }
        }

        return self::$sNotFound;
    }

    /**
     * @param string $sTarget Add a target to the links?
     *
     * @return string A list with the linked music composers or $sNotFound.
     */
    public function getMusicAsUrl($sTarget = '')
    {
        if (true === $this->isReady) {
            $sMatch  = IMDBHelper::matchRegex($this->sSource, self::IMDB_MUSIC, 1);
            $aMatch  = IMDBHelper::matchRegex($sMatch, self::IMDB_NAME);
            $aReturn = [];
            if (count($aMatch[2])) {
                foreach ($aMatch[2] as $i => $sName) {
                    $aReturn[] = '<a href="https://www.imdb.com/name/' . IMDBHelper::cleanString(
                            $aMatch[1][$i]
                        ) . '/"' . ($sTarget ? ' target="' . $sTarget . '"' : '') . '>' . IMDBHelper::cleanString(
                            $sName
                        ) . '</a>';
                }

                return IMDBHelper::arrayOutput($this->bArrayOutput, $this->sSeparator, self::$sNotFound, $aReturn);
            }
        }

        return IMDBHelper::arrayOutput($this->bArrayOutput, $this->sSeparator, self::$sNotFound);
    }

This would produce something like in the screenshot below:

Screenshot 2021-06-04 004140

I hope you include this addition in a future update!

bla0r commented 3 years ago

Thank you! Have been added #165

@FabianBeiner can be closed.

FabianBeiner commented 3 years ago

Thanks, @PrinceOfAbyss & @bla0r. :)