Reduce Python NNLS RAM requirements

A user reported DYNAMITE runs out of RAM when there are many concurrent weight solving processes with the Python NNLS weight solver (specific case: 528GB RAM, ncpus_weights = 32, orbit library size 14x7x7).

Diagnosis: one of the Python NNLS processes throws a "Cannot allocate memory" error in LegacyOrbitLibrary.read_orbit_base() in the line that extracts the orbit library.

Before this PR:

The unzipped orbit library data is piped into a Python symbol. This data is then written into a binary "FortranFile" which in turn is read by the remaining statements in LegacyOrbitLibrary.read_orbit_base().
This means that for each of the ncpus_weights processes in the model iterator's processing pool, the (potentially very large) orbit libraries are in memory twice.

Solution:

Instead of unzipping the orbit libraries into a Python symbol, it is directly extracted to disk, creating the FortranFile for further use.
Still use the command line tool bunzip2 for performance and of course preserve the zipped version.

Tested:

test_nnls.py and test_notebooks.sh
Also, confirmed that the orbit library extracted via bunzip2 -c datfil/{fileroot}.dat.bz2 > {tmpfname} is binary equal to the old solution (piping the bunzip2 output into aaa.stdout and have Python write it as a binary file)

@prashjet: It is quite straightforward, but it'll be great if you can have a glance at the code changes (there're not many...).

Closes #393.

dynamics-of-stellar-systems / dynamite

Reduce Python NNLS RAM requirements #394