Xetera / scrapeer

👯‍♀️ A protocol for outsourcing web data collection to real people on browsers
5 stars 0 forks source link

Discussion: Trust and Incencitives #1

Open azerpas opened 3 weeks ago

azerpas commented 3 weeks ago

Great project, some of my thoughts:

Trust

A client can, in theory, submit any data it wants and the protocol doesn't have anything builtin to make sure the data is legitimate. Any authenticity checks have to be done out-of-band. Possibly comparing answers between clients.

Introducing an additional actor solely responsible for verifying the legitimacy of submitted data could be one solution. These 'Verifiers' could use various techniques to authenticate the data, such as headless browsers or more sophisticated bots. Scrapers attempting to cheat the protocol would then be penalized by no longer receiving tasks. Naturally, this would require some form of incentive, which brings me to my next point.

Incencitive

Scraping can become costly quickly, so participants should be compensated for their work. Users could pay a small fee, which would be redistributed among all protocol actors, including scrapers and verifiers. Ulixee has already explored this concept. While their approach (a blockchain-like monetization system) doesn't necessarily need to be followed, it could provide some valuable inspiration.

Xetera commented 3 weeks ago

Interesting. Yeah I think a system that uses this protocol to distribute jobs to users controlled by a central server in exchange for money could be an interesting idea. I thought about this a little bit and wanted to be a direct payment sort of thing without a middleman but the trust factor makes that a little difficult like you said.

Addressing the trust part is a little bit trickier though. It's difficult to create a general solution since depending on what's being scraped, there could be personalized results. For example you can't have an authoritative check for google search results since everyone gets different results depending on what data google has on the person searching.

Either way, yes there's a way to turn this concept into a SaaS though if I do ever pursue that I'd be doing it separately to this repo. Thanks for the ideas though