JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.49k stars 712 forks source link

Few question's #289

Closed dnikola closed 3 years ago

dnikola commented 3 years ago

Hello

this library look amazing. I would have few questions if possible please repaly:

  1. Can you tell me is it possible to get data in json response?
  2. Is it possible to use proxy's so IP wont get blocked
  3. Is it possible to modify json response?
  4. Are you available for custom work as freelancer?

Best regards!

dnikola commented 3 years ago

Hi @JustAnotherArchivist

after some research and installing development version i see that option --jsonl works fine so i got answer on question 1.

what i have seen i search for some hashtag on instagram which include letters like Š Ć Č and when url is retrieved also see that this letters are converted Š in S , Č in C, Ć in C what is wrong.

I get only few post's and the rest are skipped because letters are converted ?

Thanks

JustAnotherArchivist commented 3 years ago
  1. (skip)
  2. See #74 and countless duplicates
  3. You can do whatever you want with the output. snscrape doesn't care what happens once the items are produced.
  4. Not currently, no.
  5. snscrape does not do any conversions like that; it sends the supplied hashtag verbatim (just percent-encoded) to Instagram. I don't know what their servers do with it exactly though. I'd have to see some examples: command, expected output, actual output, snscrape version.
dnikola commented 3 years ago

Hi @JustAnotherArchivist

Thanks for your fast replay, please check mine

  1. i mean when i run this command i would like that output be little bit different I would like to have mentions, hashtags used for some modules (I have seen that some has it already)
  2. please check this , only 3 are returned check difference https://www.instagram.com/explore/tags/ispricajpricudokraja/ https://www.instagram.com/explore/tags/ispri%C4%8Dajpri%C4%8Dudokraja/

image


edit

after pasting link second one i have seen that this characters has been converted so tried to search for hashtag like that and got 26 result

image

?

JustAnotherArchivist commented 3 years ago
  1. Right. What you get as JSONL output is everything that snscrape extracts currently. The modules/scrapers differ strongly in that respect. If you have specific suggestions what should be added, please file an issue for that.
  2. That sounds like an issue with your Windows terminal. You probably have to enable UTF-8 (cf. #122). snscrape --jsonl --verbose instagram-hashtag ispričajpričudokraja works as expected for me and returns 26 posts.
dnikola commented 3 years ago
  1. Right. What you get as JSONL output is everything that snscrape extracts currently. The modules/scrapers differ strongly in that respect. If you have specific suggestions what should be added, please file an issue for that.
  2. That sounds like an issue with your Windows terminal. You probably have to enable UTF-8 (cf. UnicodeEncodeError on Windows command prompt when UTF-8 output is produced #122). snscrape --jsonl --verbose instagram-hashtag ispričajpričudokraja works as expected for me and returns 26 posts.

Hi @JustAnotherArchivist

thanks for your replay.

  1. great i will take a look on this on let you know
  2. ok i will check
  3. is possible to have facebook hashtag option? Why is missing?
  4. facebook-user for each i only get two posts, is there any trick to get more?
JustAnotherArchivist commented 3 years ago
  1. Facebook doesn't let users list posts with hashtags unless you're logged in. snscrape doesn't support logging in.
  2. No tricks. That scraper doesn't seem to work at all for me at the moment, although it might depend on the profile page (as some use the old design and others the new). Facebook is a mess and will also quickly ban your IP, so fixing it is tricky.
dnikola commented 3 years ago
  1. Facebook doesn't let users list posts with hashtags unless you're logged in. snscrape doesn't support logging in.
  2. No tricks. That scraper doesn't seem to work at all for me at the moment, although it might depend on the profile page (as some use the old design and others the new). Facebook is a mess and will also quickly ban your IP, so fixing it is tricky.

hi let me replay

  1. yes i know that but using right proxy it can be done. We use proxy's from specific country's which are not regulated by GDPR and EU law so hashtag's are available for scraping. I know that because few months we have been working on this, but our scripts are not so fast as yours. I could share more info if you need
  2. yes i know that about old / new design. As i mention we use proxy's, also dedicated ig and FB proxy's which are rotating on every request

could you please check adding Facebook hashtag? I can share our current scripts with you.

Regards Nikola

JustAnotherArchivist commented 3 years ago

Huh. I just took a look again, and it seems like https://www.facebook.com/hashtag/hashtag does work now. It only returned a few posts the last time I checked (which was admittedly a long time ago, probably over a year). I assume that's what your scripts are based on as well? I'll file a separate issue on adding support for this.

dnikola commented 3 years ago

Huh. I just took a look again, and it seems like https://www.facebook.com/hashtag/hashtag does work now. It only returned a few posts the last time I checked (which was admittedly a long time ago, probably over a year). I assume that's what your scripts are based on as well? I'll file a separate issue on adding support for this.

Yes it is based on that :) also, when you visit profile / page and scroll down you can load more post... possible to make your script scrape more post from page / profile ?

JustAnotherArchivist commented 3 years ago

Reopened #31

snscrape already handles the scrolling. If it stops mid-feed, that normally means your IP is banned, although snscrape is supposed to throw an error in that situation (cf. #208, #250).

dnikola commented 3 years ago

Reopened #31

snscrape already handles the scrolling. If it stops mid-feed, that normally means your IP is banned, although snscrape is supposed to throw an error in that situation (cf. #208, #250).

it just return first two post, no error :) so i think that doesn't scroll

JustAnotherArchivist commented 3 years ago

Facebook probably changed how the scrolling works then. It definitely used to work. I'll see if I can debug that sometime soon. Facebook's code is a massive mess to work with though.

JustAnotherArchivist commented 3 years ago

Closing this as the questions have been answered. I'll look into the potential Facebook scraper bugs when I have time.