Refactor `EISFitTemplate` and test data

wtbarnes commented 3 years ago

This PR accomplishes a few things related to the fitting templates

Refactor EISFitTemplate
- read_template is now a class method,
- funcinfo is now a property (with an additional static method get_funcinfo)
- print_parinfo is now just a repr on the class, i.e. doing print(template) shows you the table of fit parameters
- The checks on filename inputs have been removed. These are not needed because hdf5 catches all of these anyway and raises exceptions
- The fit template tests now test all of the fit template files in the package
- Any template path is now accessible via the function eispac.data.get_fit_template_filepath
- A list of all available templates is now accessible via the function eispac.data.fit_template_filenames
Template files now live under eispac.data.templates (This is why there are so many files changed)
Refactor the test data to use fixtures in conftest.py.
- These fixtures for the data, head, fit, and template files can be used anywhere in the package without repeatedly having to reconstruct these paths manually
- Any test data filename is accessible via eispac.data.test.get_test_filepath

wtbarnes commented 3 years ago

Diff with the template changes filtered out: https://github.com/USNavalResearchLaboratory/eispac/pull/24/files?file-filters%5B%5D=.cfg&file-filters%5B%5D=.ipynb&file-filters%5B%5D=.py

MJWeberg commented 3 years ago

Alright, I finally had the chance to sit down and review all of the code changes. Sorry for the long delay! Thanks for doing this, Will! It looks like a lot of work and really improves the testing suite and data organization. I also like giving EISFitTemplate a proper repr function. Unfortunately, however, I think there are a few of the changes that are a little mismatched with the rest of the eispac: 1) I think we should still have a separate eispac.read_template() function, even if it is just a simple wrapper around the class method. This would be consistent with the other read functions included in eispac and helps keep the most common function calls compact. 2) The base template files are more or less complete and are unlikely to change in the future. Therefore, it is probably unnecessary to test all 119 templates each time we push changes to the repo. Nevertheless, it is a good idea to have the option check and validate the files if there is a concern or need. Is there somewhere we could move that test so we can run it on-demand, rather than automatically? 3) Promoting funcinfo to a static property is fine for now (this is still just a placeholder for future functionality), however we will need to revisit it later when we add options for user-generated templates with arbitrary component functions and a choice of fitting routines. 4) Offloading the file validation checks to h5py is fine, but we should then use a try-except block to print the exception and return None rather than just stopping the entire program (this was an intentional design choice). I realize you raised an issue on this topic two weeks ago (#20), but I have not followed up yet. In order to keep the discussion threads separate, I will try to explain our rational in the open issue.

wtbarnes commented 3 years ago

Alright, I finally had the chance to sit down and review all of the code changes. Sorry for the long delay! Thanks for doing this, Will! It looks like a lot of work and really improves the testing suite and data organization. I also like giving EISFitTemplate a proper repr function.

No worries at all. I realized that I packed a lot of proposed changes into this pull request. Code review is extremely time consuming and I appreciate you giving this a thorough and thoughtful review. I've responded to each of your points below.

I think we should still have a separate eispac.read_template() function, even if it is just a simple wrapper around the class method. This would be consistent with the other read functions included in eispac and helps keep the most common function calls compact.

Fair enough! I can see how this would be less confusing from a user perspective. I can just add an alias the class method as a new function read_template.

The base template files are more or less complete and are unlikely to change in the future. Therefore, it is probably unnecessary to test all 119 templates each time we push changes to the repo. Nevertheless, it is a good idea to have the option check and validate the files if there is a concern or need. Is there somewhere we could move that test so we can run it on-demand, rather than automatically?

I would honestly not worry much about the time the CI spends running the tests or how many tests are run (unless of course it becomes prohibitively long!). Currently, running every test in the repo (including all of the template checks) takes ~10 s (for reference, running all of the tests in the sunpy repo can take many 10s of minutes and these are run on every single check in and PR). CI cycles are free and checking every single template ensures that they are always compliant with the reader function as implemented. Additionally, it seems very plausible to me that the templates could change at some point, either in format or in actual content (e.g. due to the addition of new or updated atomic data in the relevant wavelength ranges). pytest does have a series of decorators that you can add to tests to skip them by default, but I'd discourage against that at this point.

Promoting funcinfo to a static property is fine for now (this is still just a placeholder for future functionality), however we will need to revisit it later when we add options for user-generated templates with arbitrary component functions and a choice of fitting routines.

Interesting. Perhaps this should be spun off as a separate issue then? I also added get_funcinfo as a class method so that one can extract the funcinfo without needing to first instantiate a template class.

Offloading the file validation checks to h5py is fine, but we should then use a try-except block to print the exception and return None rather than just stopping the entire program (this was an intentional design choice). I realize you raised an issue on this topic two weeks ago (Raise exception rather than returning 0/calling sys.exit #20), but I have not followed up yet. In order to keep the discussion threads separate, I will try to explain our rational in the open issue.

I assumed that this was a choice due to the need to run the code in a sort of "pipeline" mode for fitting many pixels in a raster. I have a lot of thoughts on this, but perhaps it is best to revert these changes here and move the discussion to #20.

wtbarnes commented 3 years ago

Ok I've pushed some changes to address points 1 and 4 and added an additional test for returning None for invalid template filenames.

MJWeberg commented 3 years ago

Awesome! Thanks, Will!

USNavalResearchLaboratory / eispac

Refactor `EISFitTemplate` and test data #24