Closed dkirkby closed 6 years ago
I strongly suspect the underlying problem is due to a recent change in numpy that has already been flagged as a serious performance hit: numpy/numpy#6467. Hopefully this is fixed by numpy/numpy#6208 and makes it into 1.10.2.
Update bossdata.meta.create_meta_full
to check the numpy version and print a warning if it is 1.10.0 or 1.10.1. Also update the install doc page to warn against using these versions of numpy.
Try reverting to numpy 1.9.3 using:
conda install numpy=1.9.3
conda install astropy=1.0.4
python setup.py develop
The astropy downgrade is necessary since the current astropy 1.0.5 has numpy 1.10 as a dependency.
The following test now takes a few minutes to build the db, rather than too slow (many hours?) to measure:
bossquery --what PLATE,MJD,FIBER,PLUG_RA,PLUG_DEC,Z --where 'OBJTYPE="QSO"' --verbose
@NobleKennamer reports that numpy 1.10.2 is out and fixes our problem:
https://github.com/numpy/numpy/compare/v1.10.2...master
Please add any test results with this new version here...
Numpy 1.10.2 is now available via conda so the following will install it:
conda update numpy
After doing this, I renamed my spAll-v5_7_0.db
and ran a bossquery
so it would need to be re-generated. Both the lite (~3.5 mins) and full (~25 mins) db build times are back to normal!
Creation of the the full sqlite db from the downloaded FITS file is running very slowly now. I thought this was fixed by #33, where I benchmarked the conversion at 25 minutes for the DR12 spAll. I am seeing this problem with the eBOSS v5_8_0 spAll, which is much smaller than the DR12 spAll. It looks like the time is being spent in this python loop to convert each row from FITS to SQL:
Has this code changed since #33, or why is it taking so much longer now? Are there new columns in the eBOSS spAll that could explain this?