SOBotics / SOBotics.github.io

Blog site for the SOBotics chat room
https://blog.sobotics.org/
Other
1 stars 1 forks source link

Guttenberg - More sources to compare the targets with #9

Closed FelixSFD closed 7 years ago

FelixSFD commented 7 years ago

Since I've launched Guttenberg, we made lots of improvements to the algorithm that compares two posts. But since day one, we only have two sources for answers that are compared with the "target" posts: The linked and related questions

Whilst this sometimes returns good results, Guttenberg misses some really obvious matches: http://chat.stackoverflow.com/transcript/message/35789089#35789089

Are there any other sources, Guttenberg could query every minute? (SO's search won't work)

jdd-software commented 7 years ago

There is the bing/google search but as we know we could only do a few search everyday, maybe if we come up with a strategy when to search?, hence when is a post possibile plag, if it fits search bing..

FelixSFD commented 7 years ago

This free API could work: http://www.faroo.com/hp/api/api.html#description

But I don't know, if Guttenberg would need to many requests

Bhargav-Rao commented 7 years ago

@FelixSFD http://chat.stackoverflow.com/search?q=faroo&room=111347

Tunaki commented 7 years ago

What about GitHub Search API? https://developer.github.com/v3/search/

you can make up to 30 requests per minute

jdd-software commented 7 years ago

Yeah I tested some search on faroo, the main problem I found was that the search result where not good. Github seems like a more promising solution.

FelixSFD commented 7 years ago

The GitHub search sounds great! Are there any restrictions, how long q can be? If yes, it could be hard to decide, which part of the code will be used for the query

Tunaki commented 7 years ago

Conversation.

Yes, more sources would be good but we need more research. For now: possibilities are GitHub search API, and SO data dump.