Xarrow / weibo-scraper

Simple Weibo Scraper
MIT License
96 stars 18 forks source link

[Feature request] Scraping geotagged posts #13

Open CWen001 opened 4 years ago

CWen001 commented 4 years ago

Thank you very much for this wonderful pythonic package. I was wondering if you have any interest in adding the functionality of scraping location-related posts or information?

In social studies, it is quite common to search and analyze the spatial dimension of crowdsourced information. For instance, one may want to screen posts that are only in a certain place, like Shanghai, or a user-defined region. Since some posts are geo-tagged, is it possible to collect the longitude and latitude information and then filter?

Let's say, a function behaviors like:

def get_weibo_tweets_by_spatialtemporal(spatial_extent: Union[str, list], temporal_span: list = None, pages: int = None, **kwargs) -> _TweetsResponse:
    """
    Get raw geo-tagged weibo tweets by spatial extents without any authorization
    >>> from weibo_scraper import  get_weibo_tweets_by_spatialtemporal
    >>> # Only filter geotagged posts in Beijing
    >>> for tweet in get_weibo_tweets_by_name(spatial_extent='北京', pages=1): 
    >>>     print(tweet)
    >>> # filter geotagged posts using a bounding box [left, bottom, right, top] with lat and lon pairs
    >>> for tweet in get_weibo_tweets_by_name(spatial_extent=[116.305744,39.836467,116.357886,39.856335],
temporal_span=['2017-02-13', '2018-05-05'], pages=1): 
    >>>     print(tweet)
    :param spatial_extent: the region in which you want to search, either a name or a bounding-box with [left, bottom, right, top] under WGS84 geographic reference system.
    :param temporal_span: optional ,default all pages. Valid input is a list of start-end pair in string format, e.g., ['2017-02-13', '2018-05-05'] .
    :param pages: pages ,default all pages
    :return: _TweetsResponse

To implement the function, probably a few things are worth considering. For example, to include a is_geotagged logical field in the tweet object (json?), and with fields to record coordinates. (116.305744,39.836467)

Thank you very much again. Hope this package can keep going.

Xarrow commented 4 years ago

That is a awsome feature ! I have pinned this issue and studied adding this functionality to the next release.

In my opinion,apis about this lib is complex , so i will simplify this in next release . But now, the most important thing for me is prepared to interview for next job.

Thank you for your useful sugguestion!

CWen001 commented 4 years ago

Thanks for the response. Wish you all the best for the job interview.