crawler-commons / url-frontier

API definition, resources and reference implementation of URL Frontiers
Apache License 2.0
44 stars 11 forks source link

Exception caught when deleting queue #55

Closed jnioche closed 2 years ago

jnioche commented 2 years ago

when injecting apple.com and apple.com.br and deleting apple.com

10:22:01.745 [grpc-default-executor-2] ERROR c.u.service.rocksdb.RocksDBService - Exception caught when deleting ranges - DEFAULT_apple.com_ - DEFAULT_apple.com.br_
org.rocksdb.RocksDBException: end key comes before start key
    at org.rocksdb.RocksDB.deleteRange(Native Method)
    at org.rocksdb.RocksDB.deleteRange(RocksDB.java:1415)
    at crawlercommons.urlfrontier.service.rocksdb.RocksDBService.deleteRanges(RocksDBService.java:613)

The dot has a unicode value of 002E whereas the underscore has 005F. One option would be to chose a different separator, with a byte value lower than anything else. Alternatively, when working out the ranges, the sorting of the queue names could take the separator into account.