StractOrg / stract

web search done right
https://stract.com
GNU Affero General Public License v3.0
2.13k stars 48 forks source link

I wonder how do you manage to store the data #128

Closed yuhong closed 7 months ago

yuhong commented 7 months ago

"By default, we do store some usage statistics in order to improve the search results. Specifically the following information is stored for each search:" I wonder how do you manage to store the data (using spinning rust for example)

mikkeldenker commented 7 months ago

I am not sure I know exactly what you mean by 'spinning rust example' but I will do my best to provide som insights here.

If the user hasn't disabled it, we store the query text that was used for the search, a timestamp rounded down to nearest hour and which result (if any) that was clicked. We don't store anything that can tie the search back to you. All the data is also automatically deleted after 90 days. The data is stored in a scylla database which runs on a 4u server in a basement here in Copenhagen:

11zon_IMG_3063

We also have some bare metal servers at Hetzner in Frankfurt which are used for a self-hosted s3, crawling, indexing and a bit of search. We will probably move more of the infrastructure to bare metal nodes at Hetzner in the future, including the scylla database.

I'll close this issue here as there is no action for us to take, but if you have more questions please feel free to add them here.

yuhong commented 7 months ago

I mean would you use spinning rust or SSDs for this data for example. I can't imagine storing data on thousands of searches per second would be very practical without spinning rust (even if it was just for 90 days).

mikkeldenker commented 7 months ago

Oh! HDDs are fine. It's not something that's used live for each search, so it's okay that the speed is not as high. The search index needs to be stored on fast SSDs though.

yuhong commented 7 months ago

Not the point though. You will notice Marginalia Search managed to run their servers without dealing with spinning rust. Keep in mind that spinning rust fail more often than SSDs.

yuhong commented 7 months ago

"Well, currently we don't. We are bootstraped and trying to keep costs low. In the future we will have, clearly labelled, contextual ads based on your current search query and a subscription option without ads. Just to re-iterate; we will only use your current search to match ads and will never track you across searches." I hope you won't have to resort to the CPU cost of serving ads.

mikkeldenker commented 7 months ago

I answered your question in https://github.com/StractOrg/stract/issues/132