CalebBell / chemicals

chemicals: Chemical database of Chemical Engineering Design Library (ChEDL)
MIT License
186 stars 36 forks source link

Keep load speed fast #9

Closed CalebBell closed 4 years ago

CalebBell commented 4 years ago

I am opening an issue to track the load speed of chemicals. I had already forgotten from last weekend how I was measuring load speed, so documenting it seems like a good idea.

I put the following code in a file called load_one_library.py

import cProfile
import os
import numpy as np
from scipy import special
from scipy import interpolate
from scipy import optimize
import pandas as pd
import sys
import json
import io
import datetime
from time import time
import fluids.constants
import fluids.numerics
import fluids
import ht
original_modules = set(sys.modules.keys())

pr = cProfile.Profile()
t0 = time()
pr.enable()
import chemicals
pr.disable()
print('Elapsed time: %f seconds' %(time() - t0))
pr.dump_stats('load_one_library.out')
after_modules = set(sys.modules.keys())
print('Loaded libraries')
print(after_modules.difference(original_modules))

Then I run that script with

python3 -OO load_one_library.py

You have to run it a second time after the first time to ensure all the python bytecode is up to date.

Then I look at where the time is spent with

python3 -m snakeviz load_one_library.out

Then I find the elements.py file, currently the longest to load.

image

Let's leave this issue open indefinitely for now and I'll update it with timings periodically - maybe get some development docs going and move this there at some point.

One side note - the -OO flag optimizes the compiled byte code so docstrings, asserts, and a few other things are not loaded. This is the meaningful number I am targeting. I refuse to be interested in increasing load speed by having less documentation.

This is typically used when building an actual application out of libraries, or on a server when processes are starting up and shutting down often. Because of this, it is important to remember that assert statements should not be used for control flow; they should be development-only checks.

The rest of the script above outputs something like this:

Elapsed time: 0.005867 seconds
Loaded libraries
{'chemicals.solubility', 'chemicals.acentric', 'chemicals.dippr', 'chemicals.elements', 'chemicals.miscdata', 'chemicals', 'chemicals.dipole', 'chemicals.temperature', 'chemicals.critical', 'chemicals.utils', 'chemicals.refractivity', 'chemicals.exceptions', 'chemicals.vapor_pressure', 'chemicals.data_reader', 'chemicals.environment', 'chemicals.virial', 'chemicals.triple', 'chemicals.lennard_jones', 'chemicals.phase_change'}