Google provides an API endpoint for search queries that return JSON based responses that contain all the information that the plagiarism checker needs. For example, searching for lectures in the custom search engine of the API documentation returns the following:
{
"kind": "customsearch#search",
"url": {
"type": "application/json",
"template": "https://...."
},
"queries": {
"request": [
{
//...
}
],
"nextPage": [
{
"title": "Google Custom Search - lectures",
"totalResults": "781000000",
"searchTerms": "lectures",
"count": 10,
"startIndex": 11,
"inputEncoding": "utf8",
"outputEncoding": "utf8",
"safe": "off",
"cx": "017576662512468239146:omuauf_lfve"
}
]
},
"context": {
"title": "CS Curriculum",
"facets": [
[
{
"anchor": "Lectures",
"label": "lectures",
"label_with_op": "more:lectures"
}
],
[
{
"anchor": "Assignments",
"label": "assignments",
"label_with_op": "more:assignments"
}
],
[
{
"anchor": "Reference",
"label": "reference",
"label_with_op": "more:reference"
}
]
]
},
"searchInformation": {
"searchTime": 0.350489,
"formattedSearchTime": "0.35",
"totalResults": "781000000",
"formattedTotalResults": "781,000,000"
},
"items": [
{
"kind": "customsearch#result",
"title": "Introduction to Machine Learning",
"htmlTitle": "Introduction to Machine Learning",
"link": "https://see.stanford.edu/Course/CS229",
"displayLink": "see.stanford.edu",
"snippet": "Slides from Andrew's lecture on getting machine learning algorithms to work in \npractice can be found here. Previous projects: A list of last year's final projects ...",
"htmlSnippet": "Slides from Andrew's \u003cb\u003electure\u003c/b\u003e on getting machine learning algorithms to work in \u003cbr\u003e\npractice can be found here. Previous projects: A list of last year's final projects ...",
"cacheId": "vB97xQjhxVcJ",
"formattedUrl": "https://see.stanford.edu/Course/CS229",
"htmlFormattedUrl": "https://see.stanford.edu/Course/CS229",
"pagemap": {
"cse_thumbnail": [
{
"src": "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQ2_-hJWbczpcTOUvBJuymIrbHevHrTlAL-EhyPo--xfmFh0F0Ts8iCmOc",
"width": "148",
"height": "208"
}
],
"metatags": [
{
"viewport": "width=device-width, initial-scale=1"
}
],
"cse_image": [
{
"src": "https://see.stanford.edu/Content/Images/Instructors/ng.jpg"
}
]
},
"labels": [
{
"name": "lectures",
"displayName": "Lectures",
"label_with_op": "more:lectures"
}
]
},
// There are more results here
]
}
Should this type of search replace the current search or maybe added as an additional search?
Advantages
No need to parse raw HTML/DOM to get the information needed
Being able to create custom search engines that can focus on specific websites: This can improve the detection capability
Disadvantages
Need to create a custom search engine using a Google account
Creating an API key: This is the most troublesome. Either, we have to create an API key and push it into the repository, which means that anyone can use that to search for stuff which will get linked to the owner of the key (@emareg probably). Or each time someone wants to use the plagiarism checking tool, they would need to add their own API key. This is very clean/safe but it implies some work for every user.
There is a limitation on how many requests can be done for free for a certain search engine
Background information
Google provides an API endpoint for search queries that return JSON based responses that contain all the information that the plagiarism checker needs. For example, searching for lectures in the custom search engine of the API documentation returns the following:
For more information: https://developers.google.com/custom-search/v1/overview
Question
Should this type of search replace the current search or maybe added as an additional search?
Advantages
Disadvantages