elliotgao2 / gain

Web crawling framework based on asyncio.
GNU General Public License v3.0
2.03k stars 207 forks source link

Parse multiple item from each page. #7

Closed elliotgao2 closed 6 years ago

elliotgao2 commented 7 years ago
class Post(Item):
    title = Css('.entry-title')
    content = Css('.entry-content')

If a page has multiple item mathed the defined item model, Gain should have ability of parsing all of theme. I have an idea:

class Post(Item):
    __base_html__ =  Css('.entry')
    title = Css('.entry-title')
    content = Css('.entry-content')

Set item model a new attribute named __base_html__ or another appropriate name, which describe where is each item. So that we can parse multiple item from each page.