Read csv - Githubissues

alisonpchase commented 2 years ago

@cisaacstern Ok, I'm gradually figuring the work flow out. Attempting to read a csv file in the load_data function of spectra.py. Where would the path actually get defined? In spectra.py or somewhere else? Should I be putting these questions as comments in the code?

cisaacstern commented 2 years ago

@alisonpchase, awesome progress! Seems like you're definitely getting the hang of the PR workflow!

Should I be putting these questions as comments in the code?

General questions like the one about filepaths below are great for the PR discussion thread. In the future, if you ever have really detailed questions relating to specific lines of code, it might be convenient to put them in code comments. In general, it's good to keep comments sparse in the code, but I guess I'm thinking if you're ever are hitting a wall in the code editor, it might be easier to just jot down your questions in VS Code in real time, rather than having to translate them into the PR thread.

Where would the path actually get defined?

The actual path to your data probably won't be defined anywhere in this package. Because that data is specific to your application of this algorithm, but this package is a general purpose tool for anyone to use with any data. So when you actually run this algorithm on your data, you will probably do that within a separate analysis directory which just has your data files and a Jupyter notebook in it for running the algorithm. You would then open a Jupyter session backed by a conda environment in which you have installed the pigments-from-rrs package, and in the Jupyter notebook write

import pigments_from_rrs

my_data = pigments_from_rrs.load_data("path-to-my-data.csv")

Of course, this raises the question of how we will test the algorithm as we go along. That is what my last commit https://github.com/alisonpchase/pigments-from-rrs/pull/8/commits/d191b0e6c2c59d5ecda664d94226740f46755062 addresses. If you click through and look at the changes I made, you will see I added two pytest.fixtures. In pytest, a fixture is any object we might need to test our code against. So in this case, I've created a temporary_directory fixture and an example_dataframe fixture. Then, in test_load_data I write that example dataframe to a CSV in the temporary directory, load it back in using pigments_from_rrs.load_data, and use pandas' built-in testing to ensure the data didn't get corrupted in any way by pigments_from_rrs.load_data (which it does not!).

(Note that this temporary directory is, as the name implies, temporary. It is created and destroyed by the testing code each time the tests are run.)

Right now we are just testing to make sure that any dataframe can be loaded, but in the future we'll actually want to use that dataframe to test parts of the algorithm. At that point, we can change the data dictionary in the dataframe fixture so that it has column names and values in it which can actually be run through the algorithm.

cisaacstern commented 2 years ago

IMHO, this PR is now ready to merge 🎉

We can now read CSVs using pigments_from_rrs.load_data. Of course, the example_dataframe fixture is not useful for testing any parts of the actual algorithm, but that's beyond the scope of this PR.

If you agree and decide to merge, my next question is: what do you think our next narrowly-scoped incremental PR should be focused on doing?

alisonpchase commented 2 years ago

@cisaacstern I'm diving back in, and looking at the next step in the data processing for our next incremental PR. A key step in the inversion of Rrs to get information (in our case, pigments) is defining the absorption and scattering spectra of water itself, so that we can account for and remove that signal from the measured light reflectance spectra.

This is a standard activity in optical oceanography computing, so I did a search and found some other GitHub repos where people have worked on this. In particular, there is this one: https://github.com/tadz-io/hydropt which is a full inversion model. In their case, they define a basis vector for all phytoplankton absorption, and where my method differs is that I separate out the absorption by different pigments. Anywho, they do have some code that defines the "IOPs" (Inherent Optical Properties, i.e. the absorption and scattering spectra) for "clear_nat_water". My matlab code uses the temperature and salinity of seawater to adjust the IOP spectra, so it may be different, but in case it's useful we could look at how they did the inversion of Rrs.

So I think our next PR should be to define the IOPs of water, which are known, but vary as a function of temperature and salinity, so maybe we can pass the temperature and salinity info in somewhere.

alisonpchase / pigments-from-rrs

Read csv #8