MASQ-Project / Node

MASQ combines the benefits of VPN and Tor technology to create a superior next-generation privacy software, where users are rewarded for supporting an uncensored global web. Users gain privacy and anonymity online, while helping promote Internet Freedom.
https://masqbrowser.com
Other
173 stars 28 forks source link

Optimize ip_country initialization #492

Open dnwiebe opened 3 weeks ago

dnwiebe commented 3 weeks ago

Background

After downloading the country/IP data from the DBIP website, we generate a huge dbip_country.rs file containing two functions that return large vectors of compressed binary data: one for IPV4 addresses, one for IPV6 addresses.

Then, in country_finder.rs, we have a lazy_static block that initializes a CountryCodeFinder by calling the two functions from dbip_country.rs, decompressing their compressed data, and arranging it in an in-memory CountryCodeFinder structure that makes looking up country codes by IP address very quick.

Our problem is that all this data shuffling takes quite a bit of time on startup (on the order of six seconds), and everything else has to wait on it. Not only does this make things hard on the GUI folks, it means that our tests take a long time to run and need large timeouts to handle the delay.

Tasks

Background Initialization

Do the ip_country initialization, including creating and populating the CountryCodeFinder, in a background future or thread, so that the rest of the Node can be coming up at the same time. Many of our tests, and much of what the GUI project needs, can be satisfied just by having the UIGateway actor be running and responsive. We don't actually need the CountryCodeFinder to be ready until we start Gossipping.

Suggestions: Add an origin or sender or source field to the StartMessage actor message that kicks off Neighborhood operations, and have one StartMessage sent by the ActorSystemFactory once all the actors are started (as happens now), and have another StartMessage sent by the background ip_country initialization task when it finishes. Modify the Neighborhood so that it only begins operations when it has received StartMessages from both sources.

You might have to arrange something special to set the country code in the Neighborhood's root Node, since you won't know that information until after the ip_country initialization finishes, and right now the Neighborhood expects to get that information at startup.

For this new mode of initialization, lazy_static probably won't work. Instead, you might consider making the CountryCodeFinder an Arc<Mutex<Option<CountryCodeFinder>>> and accessing it through a small set of static functions that are visible everywhere. It would start out as None, and then be instantiated later by the initialization future/thread, and panic the Node if it was read before it had been initialized.

Split IPV4 and IPV6

The IPV4 and IPV6 initializations are entirely independent from one another, and could save time if they were done on different cores--that is, in different threads or futures. Arrange things so that the IPV4 and IPV6 parts of CountryCodeFinder are initialized on different cores (if available).

Extra Credit

See if you can figure a way to tell the Node, on the command line, that it's coming up in a test environment (for example, in multinode_integration_tests). Look carefully: there may already be such a mechanism. If there is already something like this, or you create one, you can fix things so that if the Node is being tested, it uses the short six-IP test_dbip_country version of the data rather than the big dbip_country one, so that test starts will be much faster.

kauri-hero commented 2 weeks ago

Related to optimising the bootup time it takes for MASQ Node to initialize