andycasey / smhr

Spectroscopy Made Hard(er)
14 stars 7 forks source link

Remove master line list #242

Closed alexji closed 7 years ago

alexji commented 7 years ago

@andycasey and I have decided that maintaining a master line list in SMHR is way too much trouble. So we have decided to get rid of that and store atomic data individually with each spectral model/measurement.

This should greatly speed up loading and saving, remove the unnecessary code for ensuring unique lines, reduce the size of the saved line lists, and allow e.g. editing loggf for individual lines in a synthesis.

It sacrifices line consistency, but this is not something that we can robustly do on the fly, and it costs too much in code complexity.

andycasey commented 7 years ago

Can I add:

alexji commented 7 years ago

Actually I think that extra functionality should be another issue (making now). That way this issue can be closed once the branch is functional.

alexji commented 7 years ago

Just for a record, there is a choice to be made about how to save the atomic data for the transitions.

Overall, serialization results in very simple code for saving/loading and compartmentalizes well. For this reason, this is what I've implemented now. However fits is probably more efficient and future-proof.

I did a quick test with an old linelist to see how things scale.

N lines = 618
pkl save: 0.211/20 = 0.011
pkl load: 0.934/20 = 0.047
fits save: 0.675/20 = 0.034
fits load: 0.525/20 = 0.026
File size: pkl=344516, fits=135360

N lines = 1
pkl save: 0.021/20 = 0.001
pkl load: 0.147/20 = 0.007
fits save: 0.620/20 = 0.031
fits load: 0.493/20 = 0.025
File size: pkl=3160, fits=11520

For a small number of lines serialization is better, but for a large number of lines fits is better. Since we'll typically be in the >1000 lines range as soon as a single synthesis is added, fits seems better.

A potentially bigger concern with serialization is that smh.LineList subclasses astropy.table.Table, which means as the astropy Table code changes the serialization may not work and potentially corrupt save files or result in down-the-road errors. I'm not sure exactly how this works out.

andycasey commented 7 years ago

I think definitely fits is the way to go in part because if we have it as a serialised object, then those objects will need to reference certain parts of the code when de-serialising (e.g., functions, namespaces). If the namespace changes in the future, then suddenly those previously serialised objects may not load correctly. I've had this in the past when pickling objects for The Cannon.

So I think for that reason alone, fits is better.

astropy.table.Table won't have any massive API changes in the foreseeable future, so I think it is probably still OK to sub-class it. If we're concerned though, we can just create a custom object and then have our functions point to astropy table functions, eg:

class LineList(object):
    @classmethod
    def read(self, path, **kwargs):
        self._table = astropy.table.Table(path, **kwargs)
        return self._table
alexji commented 7 years ago

Would it be okay to serialize it as an astropy Table (or its underlying data structure) and then convert it to a LineList?

alexji commented 7 years ago

In particular to use Table.as_array in __getstate__ for the serialization. There's no way numpy structured arrays will change, and it's (inefficient but) fast to convert to/from LineList.

I did a check and this does not significantly slow down saving/loading, and it actually results in a slightly smaller file size when saving.

andycasey commented 7 years ago

Yeah, saving as a numpy structured array would be OK. (And saving on overheads due to not needing the serialised LineList class structure will help)

andycasey commented 7 years ago

(And helps for compatibility with any external codes, since it only relies on numpy).

alexji commented 7 years ago

OK. Everything seems to be working okay when creating sessions etc from scripts and tests. I am going to work on refactoring the gui on a separate branch before trying to fix the linelist manager to work with these updates.

alexji commented 7 years ago

Looks to all be working and can import lines as well too! (The importing GUI needs work but that can be a separate thing.)

Fixed as of 961585a54675935607b093015dbe9a8ec0209971