Ornias1993 / fetlife-aslsearch-reborn

Tampermonkey user script offering an interface to perform pseudo-automatic searches of the FetLife.com user base filtered by age, sex, location, and role.
6 stars 2 forks source link

idea #65

Closed ethaniel closed 4 years ago

ethaniel commented 4 years ago

Dear Ornias,

I have an idea.

Since this script didn't work for me, and I'm a developer myself, how about we make a separate searchable database of fetlife accounts?

Basically, the tampermonkey script would only parse the fetlife pages that the user opens and scrapes all the information that the user sees about other users - page URL, age, avatar URL, sex, kink, etc and then POST to a remote database.

Then, the actual search would be against that database (without crawling fetlife). It would be super-fast and almost accurate.

Fetlife will not be happy about this, but we could host this on some bulletproof hosting. I'm a PHP/Mysql developer and can help with the backend.

Slofe commented 4 years ago

It's not working for anyone at the moment I don't think. I'm under the impression it's still being worked on. A searchable database was done years ago and there was a big hoo Har about it. If your proposed script only scrapes pages opened by people doesn't that sort of defeat the object of this script? I'd still have to trawl through everyone in my area and open their pages individually to get them added to the database plus it misses new people joining all the time, unless I've misunderstood you?

If there was a way it can scrape everyone in a certain area once, add them to a database and some how not scrape them again the next time you run the script and only scrape new people added since the last time it was run? Now that would really, really be useful!

Ornias1993 commented 4 years ago

Firstoff thank you both for your feedback.

@ethaniel I completely retested the script past week. It does work, however currently Age, Sex and (depending on the dataset we have scraped) Locations are the only variables that I can give guarentees on.

The current tamper monkey script already only scrapes the pages you visit and nothing more, the reason it is slow as shit, is the awkward google-spreadsheet based database. Searches against that database Suck. Donkey. Balls.

If you look at my current PR, it basically already is rewriten to be an (SQL) database with an API. In contrast to the current live version which is based on a laughable amount of google spreadsheets. Which (ofcourse) was not my idea, but the previous maintainer designed it like this.

But TLDR: Your idea is already implemented in the 0.6 beta under PR's and I would appreciate any feedback on that you can give me 👍

Also: The bullet proof hosting is the thing, besides testing and review, delaying 0.6 at the moment. Hosting is featured in Issue #62

@Slofe It is working and retested again past weeks. But only thing I can guarantee somewhat to be working are Age, Sex and (sometimes) location. Primarily Location is a bitch, because fetlife doesn't list all 3 location variables on every page. For example: Scraping a userlist of american(!) users, means they are missing the country field.

A searchable database was done years ago and there was a big hoo Har about it. I got a copy of said database, version 0.6 will also include a merger of said data combined with the current content of Fetlife ASL Reborn. This adds basic searchable data (Sex, Location, Age) for about 4 milion accounts.

"If your proposed script only scrapes pages opened by people doesn't that sort of defeat the object of this script?" It's not a proposed script, it is an actual working script. And not it doesn't, because it also scrapes data from every userprofile in a discussion, userlist, location list and so forth. For example: Open a profile with some friends, followers and such and you are actually uploading basic data for 20+ profiles.

But, once 0.6 is launched I do want to add an active scraper, to grab some more data. But I think we all can agree a version that is searching more smoothly is a higher priority.

"I'd still have to trawl through everyone in my area and open their pages individually to get them added to the database plus it misses new people joining all the time, unless I've misunderstood you?"

However, as I stated before: An active scraper is on my mind and more basic user details will get added. You do have a point and i'll add an "active scraper" todo as an issue, to be clear it is actually noted. Thanks for that feedback

However: Such an active scraper will NOT be rolled out to users. As it would be a guarantee everyone and their dog could DDOS the API. It would run server-side only, based on fetlifes retarded idea of having auto-incremental user ID's. All I need is a scraper opening fetlife.com/users/$incremental_Number. Its not that hard and will get done someday! ;)

Ornias1993 commented 4 years ago

As the input of @ethaniel is already almost completely part of 0.6 (see PR #56 ), I'll close this issue. Feedback, review and advice on 0.6 is, however, more than welcome! ;)