bopoda / robots-txt-parser

PHP class for parse all directives from robots.txt files according to specifications
http://robots.jeka.by
MIT License
44 stars 17 forks source link

Get Url sitemaps #7

Closed LeMoussel closed 7 years ago

LeMoussel commented 7 years ago

No public to get sitemap Url For example with this :

          Sitemap: http://example.com/sitemap.xml?year=2016
          Sitemap: http://example.com/sitemap.xml?year=2016
          Sitemap: http://example.com/sitemap.xml?year=2016
          User-agent: *
          Disallow: /admin/
          Sitemap: http://somesite.com/sitemap.xml
          User-agent: Googlebot
          Sitemap: http://internet.com/sitemap.xml
          User-agent: Yahoo
          Sitemap: http://worldwideweb.com/sitemap.xml
          Sitemap: http://example.com/sitemap.xml?year=2017

The number of sitemaps Url is 5

Maybe You can do something like this

    /**
     * Get sitemaps wrapper
     *
     * @return array
     */
    public function getSitemaps()
    {
        $sitemaps = array();
        $rulesAgentAll = $this->getRules('*');
        foreach ($rulesAgentAll[self::DIRECTIVE_SITEMAP] as $sitemap) {
            $sitemaps[] = $sitemap;
        }

        return array_unique($sitemaps);
    }
bopoda commented 7 years ago

@LeMoussel good idea!

You can add PR if you want. Or i will add PR with fix today evening or tomorrow.

Now we can get sitemaps links only like this:

$rules = $parser->getRules('*');
var_dump($rules['sitemap'])

Not so difficult but getSitemaps will be better.

LeMoussel commented 7 years ago

I don't know GitHub commands. Can you add PR to fix that?

bopoda commented 7 years ago

added getSitemaps method in PR https://github.com/bopoda/robots-txt-parser/pull/8.