Multiple Crawlers - Githubissues

boramalper / magnetico

Autonomous (self-hosted) BitTorrent DHT search engine suite.

http://labs.boramalper.org/magnetico/

GNU Affero General Public License v3.0

3.06k stars 344 forks source link

Multiple Crawlers #106

Closed schemen closed 5 years ago

schemen commented 7 years ago

Hi!

Quick question. Is it possible to run multiple crawlers with a shared network file system extending the same database?

Scenario: I got it running via Docker on a Synology system which is slow, I got another server, could they extend each other?

Cheers!

ad-m commented 7 years ago

SQLite used in magnetico* is not optimized for network applications. It would be valuable to add support to database in client-server architecture. See https://sqlite.org/whentouse.html .

Glandos commented 7 years ago

Maybe magneticod should stay as-is, using one database file per-instance. However, magneticow can be modified to query multiple databases. This would lead to some duplicates, but code for magneticod persistence would stay simple.

ad-m commented 7 years ago

I think that abstraction and two storage implementations would not complicate the code too much

Glandos commented 7 years ago

Yes, of course, abstraction layer for accessing SQL database is quite easy nowadays. But I was also thinking that it could be a good feature for magneticow to have multiple sources.

ad-m commented 7 years ago

Why do you want to have multiple data sources? In magneticow or in mangeticod? What are the obstacles to the centralization of the database to keep simplicity?

Glandos commented 7 years ago

I'm talking about multiple data sources in magneticow. The use case is when you can have multiple magneticod running on different hosts far away from each other. magneticod needs a fast connection to its database to be able to find duplicates quickly (as from today), whereas magneticow can takes more time to answer a user request.

ad-m commented 7 years ago

@Glandos , replication is a solution.

Glandos commented 7 years ago

Fortunately, there often is multiple solutions for a single problem in software engineering ;)

ad-m commented 7 years ago

Yes, but I think it is worth choosing solutions that will make magnetico* a simple solution that is usable without the title of IT professor. Delegation of issues to external optional components helps this.

I wonder if relational databases are optimal for us anyway.

blimeybloke commented 7 years ago

mysql support would be awesome :)

skobkin commented 6 years ago

Maybe magneticod should stay as-is, using one database file per-instance. However, magneticow can be modified to query multiple databases

So you'll get data duplication across all databases. What are your initial goals? Speeding up the crawling process or data replication?

boramalper commented 5 years ago

Go version supports multiple trawlers/crawlers so this is no longer an issue. Also database access is abstracted in pkg/persistence module so -in future- we can have different database engines (such as MySQL, Postgres, etc.) which have better concurrency support. =)