NikolaiT / GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
https://scrapeulous.com/
Apache License 2.0
2.64k stars 743 forks source link

How to use this tool for detecting plagiarised contents? #71

Closed moazam1 closed 9 years ago

moazam1 commented 9 years ago

Thanks for making a great library and sharing it with community.

I would like to use this tool to search plagiarised contents on Google, Bing, Duckduckgo, and other search engines. Do you plan to provide this functionality which will take full article and return the match in percentage. More or less like this one: http://bit.ly/17Ujbnd

Thanks

NikolaiT commented 9 years ago

Hey, nice you like the tool :)

Detecting plagiarized contents involves getting the actual results (request the web sites that the scraper found) which is not the purpose of this tool.

You could however very easily write some lines of code that will implement the logic. Look into the examples directory, there you see a file called usage.py. This shows how to use GoogleScraper and then add your own logic.

If you cannot program in Python, then you could hire somebody...

Cheers!

Edit:

I added finding plagiarized contents on my TODO list. If you're a little bit patient (I am overloaded with work right now :/), you can see how to do it in a few days.

TODO list:

19.01.2015

    - add four different examples to 'examples/':
        - a basic usage
        - using selenium mode
        - using http mode
        - using async mode
        - scraping images
        - finding plagiarized content
moazam1 commented 9 years ago

Brilliant @NikolaiT The idea is to take the whole post, split into small chunks, wrap chunks into quotations and then do Google search. If the search result is found, Google will show it in bold letters.

NikolaiT commented 9 years ago

Added finding_plagiarized_content.py to examples. Maybe you need to change some stuff.

You can adapt it that it fits your needs: https://github.com/NikolaiT/GoogleScraper/blob/master/examples/finding_plagiarized_content.py

moazam1 commented 9 years ago

Thanks @NikolaiT I will have a look at this. I sent you an email. Cheers!

NikolaiT commented 9 years ago

Replied on your mail.

moazam1 commented 9 years ago

Thank you