emareg / paper-checker

Find simple grammar mistakes in scientific documents.
17 stars 9 forks source link

[question][plagiarism] Would it be better to use the Google Search API instead of normal Google search #23

Open egekorkan opened 4 years ago

egekorkan commented 4 years ago

Background information

Google provides an API endpoint for search queries that return JSON based responses that contain all the information that the plagiarism checker needs. For example, searching for lectures in the custom search engine of the API documentation returns the following:

{
  "kind": "customsearch#search",
  "url": {
    "type": "application/json",
    "template": "https://...."
  },
  "queries": {
    "request": [
      {
//...
      }
    ],
    "nextPage": [
      {
        "title": "Google Custom Search - lectures",
        "totalResults": "781000000",
        "searchTerms": "lectures",
        "count": 10,
        "startIndex": 11,
        "inputEncoding": "utf8",
        "outputEncoding": "utf8",
        "safe": "off",
        "cx": "017576662512468239146:omuauf_lfve"
      }
    ]
  },
  "context": {
    "title": "CS Curriculum",
    "facets": [
      [
        {
          "anchor": "Lectures",
          "label": "lectures",
          "label_with_op": "more:lectures"
        }
      ],
      [
        {
          "anchor": "Assignments",
          "label": "assignments",
          "label_with_op": "more:assignments"
        }
      ],
      [
        {
          "anchor": "Reference",
          "label": "reference",
          "label_with_op": "more:reference"
        }
      ]
    ]
  },
  "searchInformation": {
    "searchTime": 0.350489,
    "formattedSearchTime": "0.35",
    "totalResults": "781000000",
    "formattedTotalResults": "781,000,000"
  },
  "items": [
    {
      "kind": "customsearch#result",
      "title": "Introduction to Machine Learning",
      "htmlTitle": "Introduction to Machine Learning",
      "link": "https://see.stanford.edu/Course/CS229",
      "displayLink": "see.stanford.edu",
      "snippet": "Slides from Andrew's lecture on getting machine learning algorithms to work in \npractice can be found here. Previous projects: A list of last year's final projects ...",
      "htmlSnippet": "Slides from Andrew's \u003cb\u003electure\u003c/b\u003e on getting machine learning algorithms to work in \u003cbr\u003e\npractice can be found here. Previous projects: A list of last year's final projects ...",
      "cacheId": "vB97xQjhxVcJ",
      "formattedUrl": "https://see.stanford.edu/Course/CS229",
      "htmlFormattedUrl": "https://see.stanford.edu/Course/CS229",
      "pagemap": {
        "cse_thumbnail": [
          {
            "src": "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQ2_-hJWbczpcTOUvBJuymIrbHevHrTlAL-EhyPo--xfmFh0F0Ts8iCmOc",
            "width": "148",
            "height": "208"
          }
        ],
        "metatags": [
          {
            "viewport": "width=device-width, initial-scale=1"
          }
        ],
        "cse_image": [
          {
            "src": "https://see.stanford.edu/Content/Images/Instructors/ng.jpg"
          }
        ]
      },
      "labels": [
        {
          "name": "lectures",
          "displayName": "Lectures",
          "label_with_op": "more:lectures"
        }
      ]
    },
// There are more results here
  ]
}

For more information: https://developers.google.com/custom-search/v1/overview

Question

Should this type of search replace the current search or maybe added as an additional search?

Advantages

Disadvantages

emareg commented 4 years ago

This would be a nice additional feature but I would not replace the basic google search because of the need to register an API key.