ConaireD / TolimanWIP

WIP Toliman codes go here before being added to the main TOLIMAN github
Other
0 stars 0 forks source link

Sharing array components. #1

Closed Jordan-Dennis closed 1 year ago

Jordan-Dennis commented 1 year ago

Hi all, The array components are not currently stored in this repository because they are large. However, we do need to be able to share them. We can just bite the bullet and commit them before untracking them, or we can share the scripts that build them. There is some advantage to sharing the scripts as it means that the user can freely control the resolution. I'm starting this issue to discuss how we want to do this, Regards Jordan.

ConaireD commented 1 year ago

I agree to uploading the scripts to generate the arrays. The optical design isn't 100% finalised yet so it would be handy to be able to update the arrays quickly without having to upload them. However, we should check that it is ok to upload such things - might be IP issues?

Jordan-Dennis commented 1 year ago

Finding the mask on install is a tricky problem that I don't have the solution to.

Jordan-Dennis commented 1 year ago

So I think it might be best if we store the code and the assets separately. All of the assets except for the mask are generated programatically, so this is a third option. For now maybe we just upload a mask.zip to a separate repo.

Jordan-Dennis commented 1 year ago

I have decided that programmatic generation is the way to go and I will add this to the __init__ file along with helpfull error messages and progress bars ect.

Jordan-Dennis commented 1 year ago

As we can see in my discussion around #2 I am planning to have the setup done as automatically as possible, with the mask file remaining the only one that needs to be transported regularly (atm). This means that I can move some of the code that is in toliman/toliman.py into __init__.py and flesh it out. In particular should I make the a class for setting up the spectral information for Alpha Centauri as this will involve webscrapping and other complex things. I do not like the idea of this been a class, but I don't want to break it into a separate file.

Jordan-Dennis commented 1 year ago

So after setting up PyPi :tada:, I discovered that this actually will not work because __init__.py is invoked on import not on build. However, I should be able to get around this using poetry which has a build-system field.

Jordan-Dennis commented 1 year ago

Now that I have started setting this up I think that it will be really nice. This comment is to remind me to look into alive-progress so that the build provides useful and pretty information as it goes.

Jordan-Dennis commented 1 year ago

In the interest of speed with the install I should also make this multiprocessed using multiprocess, but that is part of the standard library so it does not count as a dependency.

Jordan-Dennis commented 1 year ago

So I have been thinking about how to share mask.npy in such a way that it is not commited to main. I think by far the best way to do this will be to create a branch with different history to main called mask that literally only contains the mask.npy file. This can then be downloaded using build.py.

Jordan-Dennis commented 1 year ago

I will use an orphaned branch to achieve the sharing of the mask.npy.

Jordan-Dennis commented 1 year ago

I need to implement tests for build.py. This should cover the scenario that it is already installed. Indeed this may actually require that I use pytest-workflow (see #43), but it has the problem that it is not cross platform. This means that instead I should use fixtures to create the files and tear them down, but fixtures cache the results so that may not work as well as I want it to. In the very least I can just make functions, which may be the way to go (see #33). pytest-diff may also be needed here so I should check out #19.

Jordan-Dennis commented 1 year ago

So I am not going to use pytest-workflow (see #43) and are using the os module in the python standard library. I also do not know id this will work with fixtures (see #33) so for now I am not going to use them and are just using normal functions.

Jordan-Dennis commented 1 year ago

To use mark.parametrise in the way I want I will have to resport to if statements. This is not so bad.

Jordan-Dennis commented 1 year ago

So I think that I will have to make build a module that the user must invoke before using the package. This will probably be easiest, since poetry does not install the dependencies before running build.py using the other method (see #2).

Jordan-Dennis commented 1 year ago

I think that I only want to install the phoenix once for the tests as it takes minutes to complete. This violates the F of F.I.R.S.T (see #13).

Jordan-Dennis commented 1 year ago

So the phoenix installation should be multi-threaded using multiprocess. This will buy me a little time. I think I should turn this into a submodule perhaps, although I am not sure how to do so with poetry (see #20).

Jordan-Dennis commented 1 year ago

I think that I will move the build code into a submodule, since it cannot automatically occur at build time.

Jordan-Dennis commented 1 year ago

I just timed the phoenix download and it took more than 10 minutes to complete. This will need to be multithreaded, and as a result the progress bars will probably have to go.

Jordan-Dennis commented 1 year ago

OK, I think that for testing purposes I should avoid doing the actuall downloads. I think this just means that I need to rearrange the API so that it is easier to test and I can check that the url exists and that will be enough. Then I need to work out how to do a fake download and only transfer a small subset of the data. This I think will be the trick, but I do not want that functionality to face the user (see #2).

Jordan-Dennis commented 1 year ago

So I believe that I can improve the performance of the tests by using stream = True, since this does not fetch the entire .fits. This means that I can validate the connection and have a setting to actually go through with the download. I can then use the first validation step in the testing to write only a small piece of the .fits (see #13).

Jordan-Dennis commented 1 year ago

Not sure about the PEP conventions for early exit from a for loop iteration using continue. This is a reminder to look it up when I am done with the refactor.

Jordan-Dennis commented 1 year ago

For better readability I am considering making a PhoenixSpectra type so that it is easy to index the correct point for the wavelengths and fluxes ect.

Jordan-Dennis commented 1 year ago

@benjaminpope has alerted me to the existence of git-lfs, which is custom built for sharing large files. This might make life much easier althought the build code for phoenix will still be required.

Jordan-Dennis commented 1 year ago

A comparison of client frameworks for python shows that requests is very slow. We may be able to save some time using pycurl instead but this is low priority.

Jordan-Dennis commented 1 year ago

Great article on parametrizing fixtures (see #33).

Jordan-Dennis commented 1 year ago

I need to make sure that the root directory for the assets is configuratble through an external constant.

Jordan-Dennis commented 1 year ago

So there is a plugin pytest-mock that let's you assign a function as another function (more or less) so that connecting isn't required. I don't think we can make it work for us but it is an interesting concept.

Jordan-Dennis commented 1 year ago

The repository is growing much faster than I would like. I really need to go back through and do some pruning of the git history.

Jordan-Dennis commented 1 year ago

Note to self: I do not have to create an accurate spectra for the tests of the clip and resample as these are shape based operations. I can just make some random arrays that have the correct properties. This will be much faster satisfying the F from the F.I.R.S.T principles (see #13).

Jordan-Dennis commented 1 year ago

A nice way to test functions involving mean calculations is using the mean value theorem. I don't think that is the right name actually, but basically, it cannot be greater than the largest value or smaller than the smallest.

Jordan-Dennis commented 1 year ago

It is tempting to generalise a lot of the build functionality so that it is more functional and relies less on constants. I wonder if I can use monads to solve some of the indexing via constants problems/vulnerabilities.

Jordan-Dennis commented 1 year ago

The build module is comming along nicely. Some comments on the design that I will add to the documentation: a) each submodule is equipped with an is_..._installed and install_... making a nice consistent interface; and b), this could be enscapulated in a object oriented structure with an class Installer. I have chosen not to implement such a structure because the object would be entirely static, (not having any state) and python does not have the best support for static classes.

Jordan-Dennis commented 1 year ago

I cannot use pytest-xdist (see #15) for any tests that have file operations as the state is shared across the threads.

Jordan-Dennis commented 1 year ago

os.makedirs is the same as make_dir_and_parents. I should remove the latter in favor of the former. This will reduce the number of tests but will require further refactoring.

Jordan-Dennis commented 1 year ago

I should mark the tests in groups for xdist and give each group a temporary directory specific to its process/thread. This would address the F in F.I.R.S.T (see #15).

Jordan-Dennis commented 1 year ago

With the completion of the build module in #51 and #52 this issue has been solved.