Closed michaeldinzinger closed 1 year ago
Thanks @michaeldinzinger It has been a while since I last looked at that part of the code. As it stands, the Ignite implementation overrides putURLs from the abstract class, which is where the multithreading is used but is does call getURLs so at least there would be multithreading on the reads. Ideally we should check that adding the super call to DistributedFrontierService does not have a negative impact on ShardedRocksDBService but I think all it would do is instantiate the executor services despite them not being used by ShardedRocksDBService itself.
Ideally we should check that adding the super call to DistributedFrontierService does not have a negative impact on ShardedRocksDBService but I think all it would do is instantiate the executor services despite them not being used by ShardedRocksDBService itself.
I don't think that the added line super(configuration);
in ShardedRocksDBService has a negative impact. However, what is still kind of improvable is that the readExecutorService
and the writeExecutorService
are instantiate twice whenever ShardedRocksDBService is used (one extra time for the RocksDBService
instance (line 38), even though this one never uses multithreading as far as I see).
This is how it looked before:
michael@pc:~/Desktop/Git/url-frontier/service$ java -Xmx2G -cp target/urlfrontier-service-*.jar crawlercommons.urlfrontier.service.URLFrontierServer implementation=crawlercommons.urlfrontier.service.rocksdb.ShardedRocksDBService nodes=2 read.thread.num=2 write.thread.num=4
17:06:46.830 [main] INFO c.u.service.AbstractFrontierService - Available processor(s) 12
17:06:46.832 [main] INFO c.u.service.AbstractFrontierService - Using 3 threads for reading from queues
17:06:46.833 [main] INFO c.u.service.AbstractFrontierService - Using 3 threads for writing to queues
17:06:46.930 [main] INFO c.u.service.AbstractFrontierService - Available processor(s) 12
17:06:46.930 [main] INFO c.u.service.AbstractFrontierService - Using 2 threads for reading from queues
17:06:46.930 [main] INFO c.u.service.AbstractFrontierService - Using 4 threads for writing to queues
17:06:46.930 [main] INFO c.u.service.rocksdb.RocksDBService - RocksDB data stored in ./rocksdb
17:06:47.139 [main] INFO c.u.service.rocksdb.RocksDBService - RocksDB loaded in 207 msec
17:06:47.142 [main] INFO c.u.service.rocksdb.RocksDBService - readQueueInfos read stats for 0 queues in 3 msec
17:06:47.142 [main] INFO c.u.service.rocksdb.RocksDBService - Recovering queues from existing RocksDB
17:06:47.142 [main] INFO c.u.service.rocksdb.RocksDBService - 0 queues discovered in 3 msec
17:06:47.143 [main] INFO c.u.service.AbstractFrontierService - Node 0: 2
17:06:47.327 [main] INFO c.u.service.URLFrontierServer - Started URLFrontierServer [ShardedRocksDBService] on port 7071 as localhost:7071
This is how it looks with the modification: The only impact is that the number of reading and writing threads differs (for one of the times when the executor services are instantiated).
michael@pc:~/Desktop/Git/url-frontier/service$ java -Xmx2G -cp target/urlfrontier-service-*.jar crawlercommons.urlfrontier.service.URLFrontierServer implementation=crawlercommons.urlfrontier.service.rocksdb.ShardedRocksDBService nodes=2 read.thread.num=2 write.thread.num=4
17:06:46.830 [main] INFO c.u.service.AbstractFrontierService - Available processor(s) 12
17:06:46.832 [main] INFO c.u.service.AbstractFrontierService - Using 2 threads for reading from queues
17:06:46.833 [main] INFO c.u.service.AbstractFrontierService - Using 4 threads for writing to queues
17:06:46.930 [main] INFO c.u.service.AbstractFrontierService - Available processor(s) 12
17:06:46.930 [main] INFO c.u.service.AbstractFrontierService - Using 2 threads for reading from queues
17:06:46.930 [main] INFO c.u.service.AbstractFrontierService - Using 4 threads for writing to queues
17:06:46.930 [main] INFO c.u.service.rocksdb.RocksDBService - RocksDB data stored in ./rocksdb
17:06:47.139 [main] INFO c.u.service.rocksdb.RocksDBService - RocksDB loaded in 207 msec
17:06:47.142 [main] INFO c.u.service.rocksdb.RocksDBService - readQueueInfos read stats for 0 queues in 3 msec
17:06:47.142 [main] INFO c.u.service.rocksdb.RocksDBService - Recovering queues from existing RocksDB
17:06:47.142 [main] INFO c.u.service.rocksdb.RocksDBService - 0 queues discovered in 3 msec
17:06:47.143 [main] INFO c.u.service.AbstractFrontierService - Node 0: 2
17:06:47.327 [main] INFO c.u.service.URLFrontierServer - Started URLFrontierServer [ShardedRocksDBService] on port 7071 as localhost:7071
One solution to get rid of the second instantiation of the executor services could be a useMultithreading
flag in the constructor of AbstractFrontierService
. This flag is set when DistributedFrontierService calls the constructor and it is not set when RocksDBService does so.
Contained in code changes in PR #79
Signed-off-by: Michael Dinzinger michael.dinzinger@uni-passau.de
Thanks for contributing to URL Frontier, your efforts are appreciated!
Developer Certificate of Origin
By contributing to URL Frontier, you accept and agree to the following terms and conditions (the Developer Certificate of Origin) for your present and future contributions submitted to URL Frontier. Please refer to the Developer Certificate of Origin section in
CONTRIBUTING.md
for details.Before opening a PR, please check that:
Thanks!