Break out core functionality into separate python modules

smindinvern commented 6 years ago

There's no need to have all of the projects code in notebooks. IMO, those should be used for interactive experimentation and as a presentation tool, not for showing all of the behind-the-scenes detail.

This branch is an effort to move in that direction.

smindinvern commented 6 years ago

Also I based this on the brown branch since it seemed to have the most up-to-date data files, as well as the hourly usage computation.

brownworth commented 6 years ago

I'm ok with the consensus however we move forward with consolidating code. My original thinking with regard to separate notebooks, and separate branches for that matter, is to isolate and test code until we have some sort of final product.

I know that Patti was looking to develop functions for repeatable functionality, so the utils.py makes sense.

brownworth commented 6 years ago

And as far as whether or not we should be putting code into notebooks, I agree that this probably would not be the way it would be handled in production in the real world. However, it is essential that we have a mechanism that demonstrates to Dr. Mohanty that there is some executable code with accompanying output, and conveys this as easily as possible.

smindinvern commented 6 years ago

However, it is essential that we have a mechanism that demonstrates to Dr. Mohanty that there is some executable code with accompanying output, and conveys this as easily as possible.

I agree, which is why I'd suggest using the notebooks for the presentation layer. I asked Mohanty about this in class today--his response was that he'd be totally fine with this approach.

I agree that this probably would not be the way it would be handled in production in the real world.

With good reason: it makes effective collaboration much more difficult. However we do things, collaboration would be greatly eased if we explicitly laid out the workflow that we're all going to abide by. Yes, this is a thinly veiled attempt on my part to suggest that my ability to collaborate would be greatly improved by doing so ;-) I suspect that there would be benefits for everyone, though.

To make things concrete, rather than just blowing so much hot air, here's a formal proposal for such a workflow:

master is strictly for what we want to show to Dr. Mohanty at any given point in time. If we don't want him evaluating us based on something, it shouldn't be in master.
develop is essentially master-next. Nothing should be put into develop that we don't want to end up in master. Occasionally develop will be merged into master to update the snapshot of development that is visible to Dr. Mohanty.
All non-trivial development efforts should be done in a separate branch. No non-trivial changes should enter develop except via the pull-request process.
Within the master, develop , and feature branches IPython notebooks should be used for presentation only. This means documentation of hypotheses, methods, data and charts, findings, conclusions, etc. Any of the data processing, analysis, or other unrelated code should be kept in Python files and imported into the relevant notebooks.
Everyone should keep their own personal branch, used for experimentation, and as a staging area for development work that might make its way into a feature branch. This is a place where others can check in on what you're doing, and commits in these branches would be appropriate for linking to from issues, e.g. 'Hey, check out what I've been working on over here. What do you guys think about X/Y?'

Following are the rationale behind these proposed rules:

As Dr. Mohanty said in class today, he wants us to keep everything that we want him to evaluate in a single branch. As he will only be checking in to evaluate us periodically, I don't think that we need to worry with the status of the master branch except at those times when it needs to be updated in preparation for grading.
In order to effectively highlight the work that we've done, and make it easier for Dr. Mohanty to give us good grades, the development history that he sees should be as clean as possible, i.e. the signal-to-noise ratio of the commit history should be as high as possible. Commits that deal with going back and fixing typos or small bugs clutter the history, and don't contribute to highlighting the work that we've done. Keeping the development history clean will make merging into master a simple fast-forward merge, again keeping the history simple.
Directing all non-trivial changes through a PR helps to facilitate discussion throughout the development process and to encourage code review. Since anything going into develop will reflect on the group, this allows everyone to give feedback on these changes. In the interest of keeping the development history clean and simple, merges can be squashed to eliminate the extra noise mentioned in (2).
There are three major reasons for minimizing the use of IPython notebooks:
- It's very difficult to identify the changes made to an IPython notebook in a commit. For example, if I want to follow what Brown is doing in his branch, I currently have two options: I can look at the commit history, and I can open the notebooks he has at the tip of his branch. The commit history may or may not give an adequate view of the changes he's made, based on what those changes are, and also based on how descriptive the commit messages are. Looking at the notebook files in the branch shows me what his code does right now, but it doesn't clearly illustrate exactly why this is important, i.e. exactly how this is different than what's in develop.
- For the same reason, it can be very tedious to merge a branch containing changes to a notebook. This is counter-productive. We should be spending our time focusing on content, not resolving merge conflicts. Pulling some feature developed in a different branch to make use of it in my own branch, or for merging into develop should be a trivial operation.
- Having code stashed in IPython notebooks means that it can't be easily pulled into another notebook for use somewhere else. So if we have e.g. GSOWeather.ipynb and LibraryData.ipynb and we want to add a new notebook looking at the correlation between the two, Correlations.ipynb, what are our options? Do we copy all of the code for parsing and cleaning the data files from the other two notebooks? If that code were in separate python files we could pull it in with a simple import statement. IPython is an awesome tool for interactive development, since it's really just a wrapper around the Python REPL, and it's great for documenting and presenting our processes, results, etc, but using it for everything feels a lot like a square peg round hole situation.
Given the points in (4), I think it's important to have an area for experimentation unfettered by the rest of these proposed rules. Coming back to the square peg round hole analogy, interactive development is an area where IPython shines, so of course we should be using it!! But once someone has some code that's more or less ready to go, just cut-and-paste it into one of the existing (or a new) library files, put that into a feature branch, and initiate a PR. At that point rules (1) through (4) are there to make it easier to see what your feature does and how it does it.

Having said all of that, I am aware that things that are second-nature to me aren't so for everyone. Things that are a big deal to me probably seem like much ado about nothing to others. So I'd really like to hear everyone's thoughts about the current status. Do you guys think everything is working out OK so far? Something that could be better? Concerns about the above proposal?

PatriciaTanzer commented 6 years ago

The rules you've listed seem reasonable to me, though I'm not sure we'll have any less problems with merging python files than with notebooks. Still, if python files are simpler than notebooks, then there's no particular reason why not to use them.

brownworth commented 6 years ago

(original comments truncated for brevity)

Proposals:

1) master is strictly for what we want to...

Sounds great. This makes a lot of sense.

2) develop is essentially master-next...

Sounds great. This makes a lot of sense.

3) All non-trivial development efforts should be done in a separate branch...

Subject to interpretation on 'non-trivial'. With a few exceptions, I am frequently going to be on the side of increasing communication over decreasing it.

4) Within the master, develop, and feature branches IPython notebooks should be used for presentation only...

Agree and disagree. I respect your opinion on notebooks, and they can be cumbersome in some regards. But, I am also of the mind that they are very effective in conveying intent and functionality to teammates as well as Dr. Mohanty.

5) Everyone should keep their own personal branch...

Agree. Seems about the way we have been doing it.

Rules:

1) [Dr. Mohanty] wants us to keep everything that we want him to evaluate in a single branch...

This fits with your first proposal and his email. I agree. Anything that makes his job easier, is ultimately better.

2) The development history that he sees should be as clean as possible...

Yes, but I am going to commit whenever I have to walk away from the code. If it really becomes an issue, commits can be squashed. I understand that he wants to see clean code, but he has also said in class that he wants to see our experimentation and learning process, by way of frequent and regular commits. This is what I see as one of the distinctions between a coding production environment and a classroom project.

3) Directing all non-trivial changes through a PR helps to facilitate discussion throughout the development process and to encourage code review...

All for it. Let's keep this going.

4a) It's very difficult to identify the changes made to an IPython notebook in a commit...

I agree. And that's why I thought we were also committing flattened python code with our commits. I didn't think this had changed.

4b) We should be spending our time focusing on content, not resolving merge conflicts...

This might be my main point of disagreement. Where you see notebooks as increasing time resolving merge conflicts, I see the absence of comments/Markdown as places where I spend more time trying to figure out what something does. I am of the strong opinion that the onus of conveying vs. interpreting intent is on the coder, not the reader. I'm willing to export to .py files for the purpose of comparing diffs, and I am committed to ensuring that they are the same as the notebooks. As far as I am concerned (and I think you agree here), looking at the diffs on two notebooks can be frustrating. Looking at the diffs on .py files is not. In that vein, I'm going to continue to do both.

4c) Having code stashed in IPython notebooks means that it can't be easily pulled into another notebook for use somewhere else...

We have talked about creating functions with specific functionality, as I believe @patriciatanzer (please correct me if I am wrong here) has indicated in her notebooks. I am not opposed to the idea of creating imports, and I am not opposed to creating code that requires me to copy/paste. I'm comfortable doing either. If we go the import route, however, I am concerned with names of functions overriding existing functionality. If we choose to go this way, I would ask that people who name imported functions/methods be responsible for making sure that there are no conflicts.

5) Having said all of that, I am aware that things that are second-nature to me aren't so for everyone...

And that's the intrinsic nature of a group project. We all have strengths we can share, and we all have places where we need to learn. If we didn't have the latter, we wouldn't be in classes.

I guess what I would like to propose with all of this is that any functionality we have, we use notebooks to demonstrate functionality and output. Once we have the functionality at the level we want it, we leave it in the notebooks and indicate that this is how it exists in python code/imports in a different location, with the necessary comments/markdown to indicate where it does this. Does that work?

PatriciaTanzer commented 6 years ago

I have no problem with comparing python files. I'll make sure to save the code to a .py file from now on, if it isn't already doing so automatically. I'm still going to use the notebooks though, because I find it easier to work with.

With regards to commits, that's a point where I'm going to disagree. I tend to work in short, irregular periods, so the commits are very important for keeping track of whatever I was doing. So I'm going to commit to my personal branch as often as necessary for that reason, and then run a single, larger commit to develop whenever I finish a major section.

With regards to imports, I'm less in favor of that, since then there are multiple files to keep track of, with their own possibly conflicting functions. However, I can see your point, so I'll agree with what Brown said that whoever adds an import needs to make sure there are no conflicts with what is already there.

With personal branches, can I add a request that we have only one personal branch? We have 10 branches right now- that's a lot to keep track of.

On a completely unrelated note, is anyone working on visualization for library data? I'd like to get into more obvious looking graphs, and I don't see anyone working on that atm (Correct me if I'm wrong).

From: Brown Biggers notifications@github.com Sent: Tuesday, October 10, 2017 5:09:41 PM To: UNCG-CSE/Library-Computer-Usage-Analysis Cc: Patricia Tanzer; Mention Subject: Re: [UNCG-CSE/Library-Computer-Usage-Analysis] Break out core functionality into separate python modules (#35)

(original comments truncated for brevity)

Proposals:

master is strictly for what we want to...

Sounds great. This makes a lot of sense.

develop is essentially master-next...

Sounds great. This makes a lot of sense.

All non-trivial development efforts should be done in a separate branch...

Subject to interpretation on 'non-trivial'. With a few exceptions, I am frequently going to be on the side of increasing communication over decreasing it.

Within the master, develop, and feature branches IPython notebooks should be used for presentation only...

Agree and disagree. I respect your opinion on notebooks, and they can be cumbersome in some regards. But, I am also of the mind that they are very effective in conveying intent and functionality to teammates as well as Dr. Mohanty.

Everyone should keep their own personal branch...

Agree. Seems about the way we have been doing it.

Rules:

[Dr. Mohanty] wants us to keep everything that we want him to evaluate in a single branch...

This fits with your first proposal and his email. I agree. Anything that makes his job easier, is ultimately better.

The development history that he sees should be as clean as possible...

Yes, but I am going to commit whenever I have to walk away from the code. If it really becomes an issue, commits can be squashed. I understand that he wants to see clean code, but he has also said in class that he wants to see our experimentation and learning process, by way of frequent and regular commits. This is what I see as one of the distinctions between a coding production environment and a classroom project.

Directing all non-trivial changes through a PR helps to facilitate discussion throughout the development process and to encourage code review...

All for it. Let's keep this going.

4a) It's very difficult to identify the changes made to an IPython notebook in a commit...

I agree. And that's why I thought we were also committing flattened python code with our commits. I didn't think this had changed.

4b) We should be spending our time focusing on content, not resolving merge conflicts...

This might be my main point of disagreement. Where you see notebooks as increasing time resolving merge conflicts, I see the absence of comments/Markdown as places where I spend more time trying to figure out what something does. I am of the strong opinion that the onus of conveying vs. interpreting intent is on the coder, not the reader. I'm willing to export to .py files for the purpose of comparing diffs, and I am committed to ensuring that they are the same as the notebooks. As far as I am concerned (and I think you agree here), looking at the diffs on two notebooks can be frustrating. Looking at the diffs on .py files is not. In that vein, I'm going to continue to do both.

4c) Having code stashed in IPython notebooks means that it can't be easily pulled into another notebook for use somewhere else...

We have talked about creating functions with specific functionality, as I believe @PatriciaTanzerhttps://github.com/patriciatanzer (please correct me if I am wrong here) has indicated in her notebooks. I am not opposed to the idea of creating imports, and I am not opposed to creating code that requires me to copy/paste. I'm comfortable doing either. If we go the import route, however, I am concerned with names of functions overriding existing functionality. If we choose to go this way, I would ask that people who name imported functions/methods be responsible for making sure that there are no conflicts.

Having said all of that, I am aware that things that are second-nature to me aren't so for everyone...

And that's the intrinsic nature of a group project. We all have strengths we can share, and we all have places where we need to learn. If we didn't have the latter, we wouldn't be in classes.

I guess what I would like to propose with all of this is that any functionality we have, we use notebooks to demonstrate functionality and output. Once we have the functionality at the level we want it, we leave it in the notebooks and indicate that this is how it exists in python code/imports in a different location, with the necessary comments/markdown to indicate where it does this. Does that work?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UNCG-CSE/Library-Computer-Usage-Analysis/pull/35#issuecomment-335609003, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AdqT4q2gwf8MQMA6H-7WQVzS_SnSV3dcks5sq90VgaJpZM4PtA8M.

brownworth commented 6 years ago

@PatriciaTanzer I'll start looking into visualization. And I believe I have eliminated all of my extraneous branches. I'll stick with my named one.

smindinvern commented 6 years ago

Ok, points taken. I find the situation much less objectionable if everyone is exporting the notebooks to .py files with each commit, so that would be greatly appreciated.

On October 11, 2017 10:03:16 AM EDT, Patricia Tanzer notifications@github.com wrote:

I have no problem with comparing python files. I'll make sure to save the code to a .py file from now on, if it isn't already doing so automatically. I'm still going to use the notebooks though, because I find it easier to work with.

With regards to commits, that's a point where I'm going to disagree. I tend to work in short, irregular periods, so the commits are very important for keeping track of whatever I was doing. So I'm going to commit to my personal branch as often as necessary for that reason, and then run a single, larger commit to develop whenever I finish a major section.

With regards to imports, I'm less in favor of that, since then there are multiple files to keep track of, with their own possibly conflicting functions. However, I can see your point, so I'll agree with what Brown said that whoever adds an import needs to make sure there are no conflicts with what is already there.

With personal branches, can I add a request that we have only one personal branch? We have 10 branches right now- that's a lot to keep track of.

On a completely unrelated note, is anyone working on visualization for library data? I'd like to get into more obvious looking graphs, and I don't see anyone working on that atm (Correct me if I'm wrong).

From: Brown Biggers notifications@github.com Sent: Tuesday, October 10, 2017 5:09:41 PM To: UNCG-CSE/Library-Computer-Usage-Analysis Cc: Patricia Tanzer; Mention Subject: Re: [UNCG-CSE/Library-Computer-Usage-Analysis] Break out core functionality into separate python modules (#35)

(original comments truncated for brevity)

Proposals:

master is strictly for what we want to...

Sounds great. This makes a lot of sense.

develop is essentially master-next...

Sounds great. This makes a lot of sense.

All non-trivial development efforts should be done in a separate branch...

Subject to interpretation on 'non-trivial'. With a few exceptions, I am frequently going to be on the side of increasing communication over decreasing it.

Within the master, develop, and feature branches IPython notebooks should be used for presentation only...

Agree and disagree. I respect your opinion on notebooks, and they can be cumbersome in some regards. But, I am also of the mind that they are very effective in conveying intent and functionality to teammates as well as Dr. Mohanty.

Everyone should keep their own personal branch...

Agree. Seems about the way we have been doing it.

Rules:

[Dr. Mohanty] wants us to keep everything that we want him to evaluate in a single branch...

This fits with your first proposal and his email. I agree. Anything that makes his job easier, is ultimately better.

The development history that he sees should be as clean as possible...

Yes, but I am going to commit whenever I have to walk away from the code. If it really becomes an issue, commits can be squashed. I understand that he wants to see clean code, but he has also said in class that he wants to see our experimentation and learning process, by way of frequent and regular commits. This is what I see as one of the distinctions between a coding production environment and a classroom project.

Directing all non-trivial changes through a PR helps to facilitate discussion throughout the development process and to encourage code review...

All for it. Let's keep this going.

4a) It's very difficult to identify the changes made to an IPython notebook in a commit...

I agree. And that's why I thought we were also committing flattened python code with our commits. I didn't think this had changed.

4b) We should be spending our time focusing on content, not resolving merge conflicts...

This might be my main point of disagreement. Where you see notebooks as increasing time resolving merge conflicts, I see the absence of comments/Markdown as places where I spend more time trying to figure out what something does. I am of the strong opinion that the onus of conveying vs. interpreting intent is on the coder, not the reader. I'm willing to export to .py files for the purpose of comparing diffs, and I am committed to ensuring that they are the same as the notebooks. As far as I am concerned (and I think you agree here), looking at the diffs on two notebooks can be frustrating. Looking at the diffs on .py files is not. In that vein, I'm going to continue to do both.

4c) Having code stashed in IPython notebooks means that it can't be easily pulled into another notebook for use somewhere else...

We have talked about creating functions with specific functionality, as I believe @PatriciaTanzerhttps://github.com/patriciatanzer (please correct me if I am wrong here) has indicated in her notebooks. I am not opposed to the idea of creating imports, and I am not opposed to creating code that requires me to copy/paste. I'm comfortable doing either. If we go the import route, however, I am concerned with names of functions overriding existing functionality. If we choose to go this way, I would ask that people who name imported functions/methods be responsible for making sure that there are no conflicts.

Having said all of that, I am aware that things that are second-nature to me aren't so for everyone...

And that's the intrinsic nature of a group project. We all have strengths we can share, and we all have places where we need to learn. If we didn't have the latter, we wouldn't be in classes.

I guess what I would like to propose with all of this is that any functionality we have, we use notebooks to demonstrate functionality and output. Once we have the functionality at the level we want it, we leave it in the notebooks and indicate that this is how it exists in python code/imports in a different location, with the necessary comments/markdown to indicate where it does this. Does that work?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/UNCG-CSE/Library-Computer-Usage-Analysis/pull/35#issuecomment-335609003, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AdqT4q2gwf8MQMA6H-7WQVzS_SnSV3dcks5sq90VgaJpZM4PtA8M.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/UNCG-CSE/Library-Computer-Usage-Analysis/pull/35#issuecomment-335820576

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

PatriciaTanzer commented 6 years ago

It looks like this pull request got pretty badly out of date - if there's still changes we need to make, is it possible to make a new one so we aren't trying to merge with old files? We definitely had a miscommunication here.

smindinvern commented 6 years ago

Hmm, I guess before doing anything with this PR it would be good to decide what, if any, code should be broken out. I'd propose that be essentially just the functions for importing and massaging each data set: machine usage, gate count, and weather. That would encompass one (or both, I suppose) of the implementations discussed in issue #19 (which still needs to resolved, I think), as well as probably Michael's work on the weather data parsing.

Those are the only things right now that I think would be useful to be able to pull into multiple notebooks. Once it's decided if/what code should be broken out, I'll take care of putting together a new PR if necessary, or fixing this one.

smindinvern commented 6 years ago

I'm going to close this out and open an issue where we can have a more philosophical discussion about this rather than considering a particular pull request.

UNCG-CSE / Library-Computer-Usage-Analysis

Break out core functionality into separate python modules #35