alisonpchase / pigments-from-rrs

MIT License
1 stars 1 forks source link

Seabass to pandas #6

Closed cisaacstern closed 3 years ago

cisaacstern commented 3 years ago

@alisonpchase, I downloaded one of the NAAMES SeaBASS files and played around with it a bit. When we spoke, I left off by suggesting Joel's readSB object might be the best in-memory container to hold your data for this project. I'm now convinced that a pandas DataFrame is definitely the way to go.

The directory pigments_from_rss/seabass/ added by this PR contains a copy of Joel's module, which I've named reader.py, as well as a new module, to_pandas.py, which provides a function that turns a readSB object into a pandas DataFrame.

An example of how to use the new module to create a DataFrame from a .sb file is provided here:

https://github.com/cisaacstern/pigments-from-rrs/blob/seabass-to-pandas/seabass_to_pandas.ipynb

By making the core data container/structure for this project a pandas DataFrame, supporting users who don't have .sb files becomes very easy. We have here a way to create DataFrames from .sb files, but you can (even more) easily create DataFrames from regular .csvs.

cisaacstern commented 3 years ago

Edit: Sorry, this might've been the wrong link! If you choose to merge this PR into the main branch of this project, you can sync these changes to your local copy of the repo like this: https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/syncing-a-fork#syncing-a-fork-from-the-command-line

alisonpchase commented 3 years ago

Thanks @cisaacstern this is great! I am stuck on merging this PR though, when I follow the instructions in the link it leads me to "configuring a remote for a fork" and then in those instructions I'm not sure which paths should be "YOUR_USERNAME", "YOUR_FORK"; it shows up as my username when I use the git remote -v command, but should it be your username and fork name?

cisaacstern commented 3 years ago

To merge, click the green "Merge pull request" button on the bottom of this thread (below this comment).

After you do that, https://github.com/alisonpchase/pigments-from-rrs should reflect the changes from this PR.

Then, go to your command line, cd into your local copy of the repo, and run:

git remote -v

Look for the name which corresponds to the repo URL. So you might see something like:

origin  https://github.com/alisonpchase/pigments-from-rrs.git (fetch)
origin  https://github.com/alisonpchase/pigments-from-rrs.git (push)

...or perhaps the name is not origin but rather, alisonpchase or something else. Let's say it's origin, in that case, run:

git fetch origin

And then:

git merge origin/main

Then your local repo should be synced with the changes in this PR. And just let me know if you have any difficulty!

alisonpchase commented 3 years ago

I think it worked! I will play around with it today and get the files of Rrs data in seabass file format organized.

cisaacstern commented 3 years ago

🌟 Nice!

Once you get these changes synced to your local copy of the repo, I recommend the following working strategy:

  1. Use the Jupyter Lab Launcher to start a new notebook.
  2. Follow the example of seabass_to_pandas.ipynb (added by this PR) to load some data into a the new notebook.
  3. Work on your first data transformation function in the notebook. You can define a function in a notebook cell, and then apply it to your DataFrame in the next cell. This will be a much easier place to iterate than writing the functions directly in a .py file.
  4. Once a function does what you want it to, move it out of the notebook into a .py file. Then import it into the notebook and run it again, to make sure it returns the expected result.
  5. Repeat until you have all the functions you want.
alisonpchase commented 3 years ago

Great, thanks, will give it a go