FactorioBlueprints / factorio-prints

factorioprints.com
https://factorioprints.com/
188 stars 37 forks source link

Fix bug in search. #27

Open FactorioBlueprints opened 6 years ago

FactorioBlueprints commented 6 years ago

The database used by factorioprints is Firebase. Firebase allows a filter clause OR an order-by clause but not both.

The old strategy was to download ALL blueprint summaries, ordered, and do the filtering client side. This worked but got slower and consumed more bandwidth over time, until it got prohibitively expensive.

The current strategy is to order-by and paginate server side and filter client side. This makes it so that there are a different number of results per page, including some pages with zero results.

I’m working on migrating from Firebase to a relational database. But this is essentially a rewrite, and it would be good to explore other strategies.

perfectsine commented 5 years ago

Can you please grab larger export of your DB as an example?

Do you have a place where we can plan the relational models?

perfectsine commented 5 years ago

I just saw where you were willing to share a copy of the DB: https://github.com/FactorioBlueprints/factorio-prints/blob/master/CONTRIBUTING.md

I wouldn't mind that! Let me take at look at the application and pagination requirements. Moving to a relational database could have some price impacts.

Fryuni commented 5 years ago

Since you considered nice using an App Engine interface for the images on GCS on the other issue, a solution to search is to deploy a super small service exposing App Engine's Search API as a REST API for querying.

Then on every save of the datastore sends the title and tags to the Search API. This could be done with Firebase Functions.

It is expected to be turn down at some moment in the not that near future since they made Search API unavailable to the new language versions and recommend moving to Elasticsearch on Compute Engine, since there is no expected time for that I think is safe but Elasticsearch is also a really great idea, there is a ready to use deployment on GCP Marketplace that automatically configures the VM.

BrettMoan commented 5 years ago

have you considered removing the order by when a person uses the search?

also where in the code are you issuing the query currently?

FactorioBlueprints commented 5 years ago

have you considered removing the order by when a person uses the search?

@BrettMoan this would be a big change because I've always used order-by and never sent criteria to firebase.

also where in the code are you issuing the query currently?

Here's the paginated, ordered query. https://github.com/FactorioBlueprints/factorio-prints/blob/stable/src/sagas/subscribeSaga.js#L68

Here's the client side filtering. https://github.com/FactorioBlueprints/factorio-prints/blob/stable/src/selectors.js#L162-L165

BrettMoan commented 5 years ago

I did some digging to confirm, but most people for large projects would opt to using 3rd party tools like elastic search. I'm baffled by the fact that firebase doesn't provide even a .contains() function but only Ordering. This due to firebase not doing indexing on the text data necessary that occurs "out of the box" on a full blown rdbms, so that you could do things like the "LIKE" operator.

Since this is LIKELY a pet project for you, I also searched for "firebase search free" ;) this returned something promising. Namely the following article:

https://medium.com/@ken11zer01/firebase-firestore-text-search-and-pagination-91a0df8131ef

Basically, for each description you're building your OWN index by splitting the strings and then storing an array of substrings. Then your checking to see if that array "contains" your key (the search).

BrettMoan commented 5 years ago

I don't know how large your data set is, but while the approach in the medium article may not work for the full description, it might work for enabling searching by tags? That would be a smaller subset of data.

Otherwise you may indeed need to wait until you can port to an rdbms.

Fusty commented 5 years ago

@FactorioBlueprints Would it be possible to continue pulling pages of ordered results and adding those to the existing collection for this "page" until your filtered result collection was equal to your pageSize OR until the numberOfPages is equal to the page number (we're at the end of the results)?

I'm doing some reading on Redux but I don't quite know where to insert the logic that would fetch yet more results and append those to the current collection. Seems like it should be somewhere after line 150 in https://github.com/FactorioBlueprints/factorio-prints/blob/stable/src/selectors.js#L150

johntron commented 5 years ago

Relational databases are good at storing relations, and the ability to perform full text searches on fields is really just one feature. They're not really optimized for that sort of thing, so even though it might be possible to migrate to an RDBMS to accomplish what you want, it's not really an ideal solution - you might find yourself needing to optimize the performance of full text search.

The tool you really need is one to create and maintain a search index - a map of search terms to records (some RDBMS do behind the scenes when you use full text searches, but it's not their specialty). Some popular tools for this are Solr and Elastic search. These days there are lots of SaaS providers offering free services if you are below a certain threshold of usage - you might do a little digging to see if you can find one. You might even consider some free platform services like Google Cloud Compute's free tier. Or you could throw some ads on factorio blueprints and use ad revenue to pay for Algolia.

Ultimately you need these things:

One thing to consider is products like Solr and Elasticsearch have out-of-the-box solutions to this; however, your use case is somewhat simple. You might be able to just pre-compute search terms as part of uploading a blueprint and use some new Firestore collection as a search index you manage yourself.

@BrettMoan's suggestion to index the search terms yourself seems like a pragmatic solution. There are two approaches you can take if you do it yourself:

FactorioBlueprints commented 5 years ago

I have the REST rewrite largely working here: https://www.factorio.school/

It's read-only and I'll sync the Firebase database to the relational database periodically to keep it relatively up-to-date. Could you folks take a look and see if it works ok before I share it more widely?

asdkant commented 5 years ago

I have the REST rewrite largely working here: https://www.factorio.school/

It's read-only and I'll sync the Firebase database to the relational database periodically to keep it relatively up-to-date. Could you folks take a look and see if it works ok before I share it more widely?

@FactorioBlueprints it seemed to be working yesterday but I just opened the page and I got an error:

Application error An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details. You can do this from the Heroku CLI with the command heroku logs --tail

FactorioBlueprints commented 5 years ago

@asdkant Sorry about that, fixed now.

boobin commented 5 years ago

I just saw the new work done with a REST API. I wasn't aware that this was being developed and created my own pet project this past 2 months => https://www.fuelforfactorio.com I mainly added more search options (entities contained / recipes produced) and replaced image hosting by a direct client-side renderer based on https://github.com/Teoxoy/factorio-blueprint-editor. I also let the API endpoints publicly available.

Even with 2 frontend, maybe it would be possible to maintain a single backend ?

I will shortly open the repository (after a bit of cleanup/doc).

FactorioBlueprints commented 5 years ago

Wow @boobin looks great! I'd like to integrate Teoxoy/factorio-blueprint-editor into factorio.school as well. Besides the broken stuff, the most common complaint is needing to upload a screenshot. I haven't had time to even investigate whether it was possible because I've been so focused on the backend. I think there's room for more than one UI and to collaborate on UI work.

boobin commented 5 years ago

Thx! I'm not so concerned about UI right now, i'm really trying to get a stable and performant API, with the specs i want for the UI:

I would understand if you refuse, but is there any chance to get even a partial dump of your blueprint table ? I consider generating random bluprints to test the services against a reasonably loaded database, but real data would be awesome.

In term of technology, my backend is a Rust stack (actix-web + diesel) with a PostgreSQL database.

barthuijgen commented 4 years ago

@boobin Funny to see you post this here, Sadly it seems your site link is dead, are you still working on this project? Because just this week I started doing the same thing I thought maybe you'd be open to discuss your progress so far. Please let me know here or reach out on discord if you use that Barry#7827

boobin commented 4 years ago

@barthuijgen I put the site down a few months ago at the end of AWS free trial period, there was no traffic anymore. It was functionnal but i lost interest when i realized i was no longer using public blueprint databases myself to play.

The code is public on https://gitlab.com/lbobinet/fuelforfactorio.