Closed davidhwyllie closed 2 years ago
Options:
Other options: build a periodic dump into catwalk cf. https://nim-lang.org/docs/marshal.html but this will block the server and is probably undesirable.
These options are not mutually exclusive.
Of these options: 1 should be implemented 4 should probably not be implemented because of the risk of loading data not in the fn4 database silently Either 2 or 3 would be satisfactory, but will still leave a multi-hour load time
It would be possible to make the fn4 server restart rapidly for READING data (which doesn't need catwalk) and later for inserting data. However, this would need careful implementation.
Option 1 implemented.
fn4_shutdown.sh now has a new optional argument --leave_catwalk_running
If invoked with this option, fn4_shutdown.sh will not shutdown catwalk.
When the server is subsequently restarted by fn4_startup.sh, no new catwalk instance will be started, and the existing one used.
Underlying problem remains, which is slow loading of information from database for very large numbers of samples
After merge of PR #124, restarting takes 30 mins per million samples. This could probably be accelerated if multifasta files were generates (which catwalk reads very fast) as opposed to loading reference compressed sequence data. However, we will consider this closed for now.
Restarting the findneighbour4 server is slow when large numbers of samples are present. This is due to repopulation of the catwalk component from the database. We need to identify ways to reduce this.