Closed schemen closed 5 years ago
SQLite used in magnetico* is not optimized for network applications. It would be valuable to add support to database in client-server architecture. See https://sqlite.org/whentouse.html .
Maybe magneticod should stay as-is, using one database file per-instance. However, magneticow can be modified to query multiple databases. This would lead to some duplicates, but code for magneticod persistence would stay simple.
I think that abstraction and two storage implementations would not complicate the code too much
Yes, of course, abstraction layer for accessing SQL database is quite easy nowadays. But I was also thinking that it could be a good feature for magneticow to have multiple sources.
Why do you want to have multiple data sources? In magneticow or in mangeticod? What are the obstacles to the centralization of the database to keep simplicity?
I'm talking about multiple data sources in magneticow. The use case is when you can have multiple magneticod running on different hosts far away from each other. magneticod needs a fast connection to its database to be able to find duplicates quickly (as from today), whereas magneticow can takes more time to answer a user request.
@Glandos , replication is a solution.
Fortunately, there often is multiple solutions for a single problem in software engineering ;)
Yes, but I think it is worth choosing solutions that will make magnetico* a simple solution that is usable without the title of IT professor. Delegation of issues to external optional components helps this.
I wonder if relational databases are optimal for us anyway.
mysql support would be awesome :)
Maybe magneticod should stay as-is, using one database file per-instance. However, magneticow can be modified to query multiple databases
So you'll get data duplication across all databases. What are your initial goals? Speeding up the crawling process or data replication?
Go version supports multiple trawlers/crawlers so this is no longer an issue. Also database access is abstracted in pkg/persistence module so -in future- we can have different database engines (such as MySQL, Postgres, etc.) which have better concurrency support. =)
Hi!
Quick question. Is it possible to run multiple crawlers with a shared network file system extending the same database?
Scenario: I got it running via Docker on a Synology system which is slow, I got another server, could they extend each other?
Cheers!