Leibniz-HBI / Social-Media-Observatory

This repository is the central communication and project management interface for the Social Media Observatory hosted by the Leibniz Insitute for Media Research | Hans-Bredow-Institute
https://leibniz-hbi.github.io/SMO/
Creative Commons Attribution 4.0 International
26 stars 1 forks source link

Improve/systematise Instagram Tools Site #20

Closed FlxVctr closed 4 years ago

FlxVctr commented 4 years ago

Features should be categorised by:

  1. What can be searched for e.g.: location feeds, user feeds, hashtag feeds
  2. What is collected e.g.: user data, date, location, comments, ...
manilevian commented 4 years ago

I added the two questions under every scraper from 1-6. I made a cut between the Modules and actual easy to use apps. I haven't had time to test the Modules under the two questions above, But i will check them out soon. Since two of those modules are Java and PHP based, I will have to take a bit more time to test them since I am not a java/php pro :)!

EDIT:

Oh and I removed the Instagrab (for now) app since the documentation is extremly poor. I even had problems understanding WHAT and HOW it will scrape stuff.

manilevian commented 4 years ago

I changed some stuff up, can you maybe go through the layout and see if its good as it is. If thats the case ill start with the other networks.

FlxVctr commented 4 years ago

Looks good overall. However, I'd organise the answers to the questions as bullet points as well.

Additionally, it would be nice to have more consistency in the feature list, i.e. if it does the same it should also have the same bullet point (e.g. location ID and location feed, is it the same? If so, I think we should call it the same. Otherwise these, and similar features would need some explanation).

Same should apply for the answers to the questions. The more structured these overviews are the easier it should be for a reader to quickly skim for what they need.

FlxVctr commented 4 years ago

Also, now there seems some overlap between "Features" and the answers to the questions. I'd cut the features which are already contained in the questions and only list "additional features". Does this make sense?

manilevian commented 4 years ago

What I did so far is, i merged the features+metadata lists into one. Therefore it’s way shorter. Also I erased the overlapping between the "questions" and the "features". Some features have been erased completely, since the questions already answer that. I added a "Advanced Libraries" list under the supported applications. I added a small text, that we cannot support those apps and that they will need programming knowledge.

I actually tested through the PHP modules. Was kind of time consuming, but with some help of stackoverflow i could at least say: It's working. So it remained in the list. I asked a Java programmer to check on the Instagram Java Scraper and he also said: Its working! Rinstapkg was tested by Jason as far as a I remember and it's also functioning.

Also did some typo work. If you find some typos or gramma fails, please make sure to poke me ;)!

FlxVctr commented 4 years ago

I do not think that the formulation of sentences in bullet points works well here. E.g.:

What can be searched for:

  • Instamancer can search by hashtag, user or even post (All by ID or actual tag)!

should rather be

What can be searched for:

  • posts by hashtag
  • posts by user
  • individual posts by ID

I would not mention modules that we cannot support at the moment. So only Python and R tools should remain (and GUI tools of course).

manilevian commented 4 years ago

I have changed the bullet points. Hope its okay like this.

Also I have a "Advanced Libraries" category under the Instagram page. Showing Libraries and Modules. Should i remove those ?! I clearly mentioned that people need programming knowledge to get those programs to work and that we cannot support these programs atm.

FlxVctr commented 4 years ago

Better now, but I don't understand, what 'ID or Tag' means under Instamancer. Also, the 'by' in 'by hashtag' should be cut (also below) to make the structure of all tool descriptions uniform.

What can be collected does not make sense for Instaphyte. If I can search for users, I cannot collect their media?

What means 'media of all sorts' (Instagram Scraper)?

'MEDIA only scraper' could also just be 'media' ;) (Instalooter)

tl;dr: The goal is to have a structure as uniform as possible for all tools, that we can then also use for all other tool lists. Main goal is a quick overview for a reader. At the moment there are still some unnecessary differences

FlxVctr commented 4 years ago

Regarding the advanced libraries: Can they do something that the others cannot? If so, keep them. Otherwise, I rather lean towards removing them, as they just cause clutter that we cannot support or assess.

manilevian commented 4 years ago

I will remove the Advanced Libs on monday, since I will need to change the table too. They are not too different and on a researchers perspective I dont think they have anything to add that is crucially different.

The ID/TAG thing is mistyped. Tags = Hashtags and ID is f.e Location-ID / User-ID. Media of all sorts has been changed to media, since its videos and pics. Changed Media only to media.

Its good that we are working back and forth with this site. Lets get this one clean and ready. Then I can go on with the other platforms.

manilevian commented 4 years ago

Hey,

I unified the "What can be searched" for and ofc also the "What can be collected". If you see a point where i missed it, please tell me so.

I also edited the table and made it a bit more readable. The Advanced Libs have ben removed from the wiki and can only be found in our merged scraper list and they have been marked red.

FlxVctr commented 4 years ago

Now I think, some points need some explanation (which again, should be the same for the same thing in different tools). E.g., I do not get what is meant by collecting 'hashtags'.

FlxVctr commented 4 years ago

Also, what is the difference between feed and posts?

manilevian commented 4 years ago

A user feed (google) is what is on a users "Storyline", it will be shared more widly. It differs from a normal post. But still I was wrong at one point and switched to feed when i meant post. So changed that too.

What are "some points" please be more specific, there are a lot of points! ;)

Hashtags shouldnt be in the collected category, changed that!

manilevian commented 4 years ago

Hey, i checked the exact difference between the Instagram Story and Post/Feed. Was kind of in a hurry last week.

Post: A single posting of a media content onto Instagram. People can comment on that.

Feed: Your whole list of posts on your profile, listed on instagram, is your feed.

Story: The Story are posts that are accessible when following someone on instagram.n It exclusively shows a "Storyline" in form of media, with personal commenting from the creator. You cannot comment on a story. A story can only be seen when following a person.

As far as I understood a "Instagram Story" is technically different than the Feed/Posts. On a research base, I don’t think a Instagram Story is very interesting. However a Story can be revealed by clicking on a users Profile Picture, which shows the activity of a whole week in gifs+Pictures+mini videos, with fancy commenting. As far as i understood, your story will not appear on your feeds.

However, most people programming scrapers, switch these terms around a lot. That can be really confusing like in our case :D! Most Scrapers which mention „Posts and Feeds“ mean Story = feed.

manilevian commented 4 years ago

Please comment and check out the new layout of the wiki, I took out the „what can be collected/searched for“. Made it a little bit more slick, added links to git and documentations(If present), added note-able features if there are any, added limitations, if any were somehow important to note. Rechecked the ability of the scrapers and changed the table.

I just don’t know where the best spot is to add whether its Python or Javascript. Or is installed through Pip or it is installed through NPM (or is manually installed). If you have an Idea where the best spot for that is, i'd be very thankful :)!

For now just check out if the layout is okay like this. If so, I can go on with the "finishing touch" (Grammar check and so on).

FlxVctr commented 4 years ago

Layout is good but Instagram scraper can collect comments and metadata.

I would just put a "Installation via: pip/npm/git clone" below notable features. Later we can add a How to for installation of both.

manilevian commented 4 years ago

Hehehe almost forgot about the Instagram Scraper confusion we had couple of weeks ago :). You're absolutely right, it has the option to download the metadata.

I already have added the "Installation via:" in my scrap and will copy paste after going through a spell check!

FlxVctr commented 4 years ago

Ok, but see that you don't overwrite my changes I made in the meantime

manilevian commented 4 years ago

As far as i know i copied your version ;)!

FlxVctr commented 4 years ago

Well, if in doubt, work with the git repo

manilevian commented 4 years ago

Its all good, your changes are in there.

manilevian commented 4 years ago

Hi,

If you have the time, could you quickly go through the Instagram Scrapers and check if it fits our requirements by now? So we can close this Issue finally :D

FlxVctr commented 4 years ago

Almost ready, looks good! Some final points:

Table

what's the difference between '-' and 'x'?

Keys

If user info includes followers and followings in general, why is it an own key?

Description

'Github and Download' is not necessarily the case, as most people won't download the tool from github. Do you mean installation instructions? Maybe just leave 'Github'.

Other tools

Empty. Is something missing?

General

manilevian commented 4 years ago

Hi,

Table The - describes that the scraper is only capeable to scrape certain metadata but not all. I will make sure to add a description for that!

Keys

User Info includes the number of the Followers and followings but not the information of the followers/followings. Scraping the followings means it will actually pull the whole list of followers with their userinfo. I added "number of" in front of followers/followings to avoid future confusion. I also added "description". That was missing too.

Description Ill remove GitHub and change it to "Download and Installation Instructions". That would probably describe it the best way.

Other Tools

I only had one but that project is now unreliable and the service doesnt work accurately. I'll remove the category and add it back, whenever tools pop up that are useful.

FlxVctr commented 4 years ago

Good work! Thanks 🙏