Closed alexji closed 7 years ago
Can I add:
Actually I think that extra functionality should be another issue (making now). That way this issue can be closed once the branch is functional.
Just for a record, there is a choice to be made about how to save the atomic data for the transitions.
Overall, serialization results in very simple code for saving/loading and compartmentalizes well. For this reason, this is what I've implemented now. However fits is probably more efficient and future-proof.
I did a quick test with an old linelist to see how things scale.
N lines = 618
pkl save: 0.211/20 = 0.011
pkl load: 0.934/20 = 0.047
fits save: 0.675/20 = 0.034
fits load: 0.525/20 = 0.026
File size: pkl=344516, fits=135360
N lines = 1
pkl save: 0.021/20 = 0.001
pkl load: 0.147/20 = 0.007
fits save: 0.620/20 = 0.031
fits load: 0.493/20 = 0.025
File size: pkl=3160, fits=11520
For a small number of lines serialization is better, but for a large number of lines fits is better. Since we'll typically be in the >1000 lines range as soon as a single synthesis is added, fits seems better.
A potentially bigger concern with serialization is that smh.LineList
subclasses astropy.table.Table
, which means as the astropy Table code changes the serialization may not work and potentially corrupt save files or result in down-the-road errors. I'm not sure exactly how this works out.
I think definitely fits is the way to go in part because if we have it as a serialised object, then those objects will need to reference certain parts of the code when de-serialising (e.g., functions, namespaces). If the namespace changes in the future, then suddenly those previously serialised objects may not load correctly. I've had this in the past when pickling objects for The Cannon.
So I think for that reason alone, fits is better.
astropy.table.Table
won't have any massive API changes in the foreseeable future, so I think it is probably still OK to sub-class it. If we're concerned though, we can just create a custom object and then have our functions point to astropy table functions, eg:
class LineList(object):
@classmethod
def read(self, path, **kwargs):
self._table = astropy.table.Table(path, **kwargs)
return self._table
Would it be okay to serialize it as an astropy Table (or its underlying data structure) and then convert it to a LineList?
In particular to use Table.as_array
in __getstate__
for the serialization. There's no way numpy structured arrays will change, and it's (inefficient but) fast to convert to/from LineList.
I did a check and this does not significantly slow down saving/loading, and it actually results in a slightly smaller file size when saving.
Yeah, saving as a numpy structured array would be OK. (And saving on overheads due to not needing the serialised LineList
class structure will help)
(And helps for compatibility with any external codes, since it only relies on numpy).
OK. Everything seems to be working okay when creating sessions etc from scripts and tests. I am going to work on refactoring the gui on a separate branch before trying to fix the linelist manager to work with these updates.
Looks to all be working and can import lines as well too! (The importing GUI needs work but that can be a separate thing.)
Fixed as of 961585a54675935607b093015dbe9a8ec0209971
@andycasey and I have decided that maintaining a master line list in SMHR is way too much trouble. So we have decided to get rid of that and store atomic data individually with each spectral model/measurement.
This should greatly speed up loading and saving, remove the unnecessary code for ensuring unique lines, reduce the size of the saved line lists, and allow e.g. editing loggf for individual lines in a synthesis.
It sacrifices line consistency, but this is not something that we can robustly do on the fly, and it costs too much in code complexity.
Bundle all profile models into one list(will just serialize one-row LineList)Save each synthesis linelist individually(whenever loading a line list will always copy it to the working directory and save it along with; the session itself will serialize the transitions)