Raisler / Youtube_Scrapy

MIT License
0 stars 0 forks source link

Consider using scrapy #1

Open Lucs1590 opened 3 years ago

Lucs1590 commented 3 years ago

Hi @Raisler , what's up? Dude, I took a look at your project and saw that you use selenium to make crawler and I also did a lot of this because 1º It's something practical and 2º works like a charm. However, this is not the core of selenium, since it was made for testing and there are some libraries/frameworks that are specialized for this (crawler), like Scrapy. Therefore, I recommend using the Scrapy framework to help you with this project. I believe that at least in performance, you will have a gain. If you are doing this project just to learn more about selenium, discard everything I wrote. Otherwise, the following tutorial can help you when implementing with Scrapy. http://pythonclub.com.br/material-do-tutorial-web-scraping-na-nuvem.html

Look that the next snippet of code is enough to get views, title and link videos.

import scrapy

def first(sel, xpath):
    return sel.xpath(xpath).extract_first()

class YoutubeChannelLister(scrapy.Spider):
    name = 'channel-lister'
    youtube_channel = 'portadosfundos'
    start_urls = ['https://www.youtube.com/user/%s/videos' % youtube_channel]

    def parse(self, response):
        for sel in response.css("ul#channels-browse-content-grid > li"):
            yield {
                'link': response.urljoin(first(sel, './/h3/a/@href')),
                'title': first(sel, './/h3/a/text()'),
                'views': first(sel, ".//ul/li[1]/text()"),
            }

I hope I have helped!

Raisler commented 3 years ago

Thank You! It was very helpful, I pretend to learn about, and I did not know much about "scrappers", so I searched for anything could help-me. More one time, Thank you.

Em ter., 5 de jan. de 2021 às 07:39, Lucas de Brito < notifications@github.com> escreveu:

Hi @Raisler https://github.com/Raisler , what's up? Dude, I took a look at your project and saw that you use selenium to make crawler and I also did a lot of this because 1º It's something practical and 2º works like a charm. However, this is not the core of selenium, since it was made for testing and there are some libraries/frameworks that are specialized for this (crawler), like Scrapy. Therefore, I recommend using the Scrapy framework to help you with this project. I believe that at least in performance, you will have a gain. If you are doing this project just to learn more about selenium, discard everything I wrote. Otherwise, the following tutorial can help you when implementing with Scrapy. http://pythonclub.com.br/material-do-tutorial-web-scraping-na-nuvem.html

Look that the next snippet of code is enough to get views, title and link videos.

import scrapy

def first(sel, xpath):

return sel.xpath(xpath).extract_first()

class YoutubeChannelLister(scrapy.Spider):

name = 'channel-lister'

youtube_channel = 'portadosfundos'

start_urls = ['https://www.youtube.com/user/%s/videos' % youtube_channel]

def parse(self, response):

    for sel in response.css("ul#channels-browse-content-grid > li"):

        yield {

            'link': response.urljoin(first(sel, './/h3/a/@href')),

            'title': first(sel, './/h3/a/text()'),

            'views': first(sel, ".//ul/li[1]/text()"),

        }

I hope I have helped!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Raisler/Youtube_Scrapy/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWCQPJD6H6CJG73KGVFK53SYL265ANCNFSM4VU5ZMCA .