Open Arkoniak opened 3 years ago
Current roadmap:
update!
utilities and everything related to automated updates of maxmind databases.The next version is 0.6 which should include number of breaking changes. The main idea is to make code type-stable and remove Dict(String, Any) construction, which takes ~95% of the time.
gz
files, since it makes no sense. (https://github.com/JuliaWeb/GeoIP.jl/pull/56)Location
to Geodesy.jl ( #43 )Locale
to a fixed type instead of CSV.RowVersion 0.7 should target improvement of initial load
load
functionDB
in a binary form in order to provide a "lazy" load.Version 0.8 may include support for MaxMind database binary support and conversion utilities. This one is questionable since MaxMind
binary file is much slower than native Julia structures. In this case, it can be added as one of the features post 1.0
I want to summarize here problems that I see with current implementation and some ideas how to overcome it.
Behind the scenes data downloading
In current implementation, data is loaded invisibly for the user. Moreover, it is not only loaded invisibly, it also downloads invisibly.
It leads to the following issues:
geolocate
call, it can take from milliseconds (actual lookup) to seconds or even minutes (when data is loaded).Solution to all of these problems is the following methods which are accessible by user:
load
: it should accept various parameters and modes. User can choose between local and internet data loading, between different database formats and localizationupdate!
: it should accept parameters similar to `load, but it should validate the current state of the database and update database if new version is available.geolocate
should be changed togeolocate(::DB, ::IP)
. For convenience,getindex
method can be addeddb[IP]
which works asgeolocate
.Loaded Data structure and results
In the current implementation
DataFrame
is used as a storage format, andDict{String, Any}
used as a return query format.It leads to the following issues
DataFrame
is type unstable by construction, so improper use can lead to unnecessary allocations and overall slowness.Row
construction is rather slowPossible solution:
StructArray
orVector
ofGeoResult
structs.GeoResult
, which should be concretely typed and have a fixed number of fields. Use sentinel values instead of missing data.