Closed gregcaporaso closed 9 years ago
A thing that i constantly see done in different ways and would be cool if we could standardize is function definitions:
# seems fair enough
def get_scrooge_mc_duck_with_multiple_options(options, retrieve_hats=False,
retrieve_sweaters=True):
"""Retrieve a duck object with multiple options
.....
"""
# respects four tab indentation for the retrieve_sweaters argument
def get_scrooge_mc_duck_with_multiple_options(options, retrieve_hats=False,
retrieve_sweaters=True):
"""Retrieve a duck object with multiple options
.....
"""
# does not respect four spaces indentation for retrieve_sweaters but aligns
def get_scrooge_mc_duck_with_multiple_options(options, retrieve_hats=False,
retrieve_sweaters=True):
"""Retrieve a duck object with multiple options
.....
"""
# does not respect four spaces indentation but aligns "oddly"
def get_scrooge_mc_duck_with_multiple_options(options,
retrieve_hats=False,
retrieve_sweaters=True):
"""Retrieve a duck object with multiple options
.....
"""
I like this:
def get_scrooge_mc_duck_with_multiple_options(options,
retrieve_hats=False,
retrieve_sweaters=True):
best of those.
Good point Yoshiki. I'm always having trouble in how to align this type of definitions, and I ended up not being consistent. In my opinion, option 1 is the best, since if the number of arguments is larger, it does not force you to make more and more lines (like option 4). Option 3 seems reasonably if it is only one ore two arguments, but we should be consistent, and I prefer to have two lines like in option 1 rather than 5 lines in option 3...
Something similar happens with the import statements:
# My 1st option
from some.library.with.large.name import (it_really_exists, another_function,
extra_function)
# I don't like this, but I've seen it a lot
from some.library.with.large.name import (it_really_exists, another_function,
extra_function)
I like:
from some.library.with.large.name import (it_really_exists, another_function, extra_function)
My preference would be to use number one, however as long as this is clarified in the guidelines I'm happy with whatever the decision is.
My preference is also for #1.
On Wed, Nov 6, 2013 at 11:42 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:
My preference would be to use number one, however as long as this is clarified in the guidelines I'm happy with whatever the decision is.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-27901065 .
I don't like aligning to parens because you have to change the spacing if you change the function name, and it pushes some argument lists too far to the right.
The following two styles are compliant with PEP8 ( http://www.python.org/dev/peps/pep-0008/#indentation):
foo = long_function_name(var_one, var_two, var_three, var_four)
def long_function_name( var_one, var_two, var_three, var_four): print(var_one)
I prefer the second, or any other style that is aligned to the grid and moves args towards the left. --Kyle
On Wed, Nov 6, 2013 at 2:43 PM, Will Van Treuren notifications@github.comwrote:
My preference is also for #1.
On Wed, Nov 6, 2013 at 11:42 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:
My preference would be to use number one, however as long as this is clarified in the guidelines I'm happy with whatever the decision is.
— Reply to this email directly or view it on GitHub< https://github.com/qiime/qiime/issues/1181#issuecomment-27901065> .
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-27906483 .
I like either of the two PEP8-compliant styles that @kylebittinger mentioned. The formatting got messed up in the previous comment, so reproducing what is shown in PEP8:
# Aligned with opening delimiter
foo = long_function_name(var_one, var_two,
var_three, var_four)
# More indentation included to distinguish this from the rest.
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
I don't like @ElDeveloper's option 1 because it makes it hard to see the difference between the end of the argument list and the start of the function's body. Option 4 isn't too terrible, but it can make function definitions very long because each argument is on its own line.
For long imports, I think we should match whatever style we end up choosing for function definitions.
Good point Jai, even if we like the other one it is not PEP8-compliant.
Found this interesting rant: http://www.joelonsoftware.com/articles/Wrong.html
On Fri, Nov 8, 2013 at 8:23 AM, Jai Ram Rideout notifications@github.comwrote:
I like either of the two PEP8-compliant styles that @kylebittingerhttps://github.com/kylebittingermentioned. The formatting got messed up in the previous comment, so reproducing what is shown in PEP8:
Aligned with opening delimiter
foo = long_function_name(var_one, var_two, var_three, var_four)
More indentation included to distinguish this from the rest.
def long_function_name( var_one, var_two, var_three, var_four): print(var_one)
I don't like @ElDeveloper https://github.com/ElDeveloper's option 1 because it makes it hard to see the difference between the end of the argument list and the start of the function's body. Option 4 isn't too terrible, but it can make function definitions very long because each argument is on its own line.
For long imports, I think we should match whatever style we end up choosing for function definitions.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-28070285 .
Antonio
Agree with @jrrideout that option 1 makes it hard to see the difference between the end of the argument list and the start of the function's body. I like the option 2 pointed by @kylebittinger (and corrected by @jrrideout ), since it makes clear the function definition and it does not make it so long.
Just a small correction to Jai's comment:
Imports should match long function calls and long list literals: break on the open paren, then 4 space indent. Like the fourth example in http://www.python.org/dev/peps/pep-0008/#indentation with my_list and result.
Imports do not need extra spaces to distinguish them from a function body, so they should not follow the special rule for function definitions.
--Kyle
On Fri, Nov 8, 2013 at 1:47 PM, josenavas notifications@github.com wrote:
Agree with @jrrideout https://github.com/jrrideout that option 1 makes it hard to see the difference between the end of the argument list and the start of the function's body. I like the option 2 pointed by @kylebittinger https://github.com/kylebittinger (and corrected by @jrrideout https://github.com/jrrideout ), since it makes clear the function definition and it does not make it so long.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-28087104 .
:+1: thanks @kylebittinger!
Found this interesting rant: http://www.joelonsoftware.com/articles/Wrong.html
An ode to code: The exceptional beauty of Doom's 3 source code.
For long imports, I think we should match whatever style we end up choosing for function definitions.
Agreed.
I don't understand the problem with
def get_scrooge_mc_duck_with_multiple_options(options, retrieve_hats=False,
retrieve_sweaters=True):
'''Documentation
Blah blah
'''
I don't know how it makes it harder to see where the argument list ends and the function begins:
Start each method, class and function with a docstring using triple double quotes (“””). The docstring should start with a 1-line description that makes sense by itself (many automated formatting tools, and the IDE, use this). This should be followed by a blank line, followed by descriptions of the parameters (if any).
Shouldn't the docstring be the clear indication of where the argument list ends and where the code (documentation) begins?
The only advantages to this approach are that it reduces the number of blank spaces, and it allows long variable names to be used without weird line breaking or overlap. These are probably of dubious value (as is probably everything in this style discussion) so I am happy with whatever.
I do think that if we are going to have a lot of guidelines than we need to more strictly enforce them. As the kerfuffle at #1177 showed, we have a lot of specific guidelines that may not be exactly followed. It seems silly to have rules that are unenforced, so we should either drastically reduce the number of rules, or be a lot more strict about them for everybody.
@wdwvt1 The point of the discussion in #1177 is that there are a lot of stylish rules that are not written anywhere, and we don't have a good code guidelines that we can give to a new developer. (In my previous lab, I get a 15-page document with coding guidelines...)
So the idea is to write a document with good coding guidelines and then enforce them in future PR's
@josenavas I understand and agree that we need to codify the unwritten rules. I am just saying empirically we don't even follow the rules we have (take a look, for instance, at how much code ignores the 80 character line break rule) so if we codify a bunch of new rules and don't follow them either, whats the point?
On Fri, Nov 8, 2013 at 1:26 PM, josenavas notifications@github.com wrote:
@wdwvt1 https://github.com/wdwvt1 The point of the discussion in #1177https://github.com/qiime/qiime/pull/1177is that there are a lot of stylish rules that are not written anywhere, and we don't have a good code guidelines that we can give to a new developer. (In my previous lab, I get a 15-page document with coding guidelines...)
So the idea is to write a document with good coding guidelines and then enforce them in future PR's
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-28095160 .
Agree, so the point is to do both. But having the document written will make easier to enforce them. So during the code reviews of the PR we should pay attention to style issues, too. For example, I noticed that @gregcaporaso and @jrrideout almost always pay attention to this issues; and the discussion in #1177 was mainly to reflect the necessity of a document with the coding guidelines, so it will simplify the coding and the reviews.
Not to single them out, but if you look at qiime/workflow/core_diverersity_analyses.py there are a large number of lines over 80 characters and functions that have no documentation strings at all. @gregcaporaso and @jrrideout are some of our top developers and they write good code, but even they don't follow the guidelines.
I make a lot of mistakes in my code, and I don't follow the rules in many instances, so don't get the impression that I am saying I am somehow better or somehow criticizing them for something that I don't also do. I just think that we need to be realistic about how important standardization is, and need to be very sure that its applied across the board if we make it a rule.
On Fri, Nov 8, 2013 at 1:39 PM, josenavas notifications@github.com wrote:
Agree, so the point is to do both. But having the document written will make easier to enforce them. So during the code reviews of the PR we should pay attention to style issues, too. For example, I noticed that @gregcaporaso https://github.com/gregcaporaso and @jrrideouthttps://github.com/jrrideoutalmost always pay attention to this issues; and the discussion in
1177 https://github.com/qiime/qiime/pull/1177 was mainly to reflect
the necessity of a document with the coding guidelines, so it will simplify the coding and the reviews.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-28096127 .
I'm sorry if I gave the impression that you are criticizing other developers.
My code is also very poor in following the code guidelines. I just noticed that lately some developers are paying more attention to style issue (although even them are not following the rules in some cases). During my work on the Galaxy integration, I have to modify a lot of scripts, and I have to say that most of them didn't follow the 80 characters rule (and I tried to fix them).
The QIIME developers team is constantly growing and I'm agree that we have reach a point where standardization is a key point. So, at this point, we shouldn't accept any PR that does not follow the coding guidelines. Doesn't matter if a release date is arriving. It is much better to have a release one day late than have an inconsistent code base, making it harder to maintain.
We really need to re-visit the QIIME code-base and enforce the code guidelines. During our port to pyqi, is a great opportunity to fix the code base and, of course, all the developers should be aware of such guidelines and be really strict about it.
This hasn't been written anywhere (I think it was only mentioned at the svn->git transition last year), but we should really enforce having at least two people (maybe three) review a pull request i. e. if two core developers* have not read the code and added a :+1: as a comment to the pull request, a new feature branch should not be merged. This would benefit the code-base and the contributions.
*core developers: any of the people as listed in the qiime-developers-team.
I like the double-indent that has been mentioned, as in
def long_function_name(arg1, ..., arg999,
arg1000, arg1001):
"""This is my doc string!
And further explanation goes here
"""
code = True
@ElDeveloper Totally agree. The core developers should be more concerned about reviewing code (and I include myself here).
Also worth mentioning: the style that aligns lines to the delimiter, e.g. @ElDeveloper options 3 and 4 can require spacing changes, as @kylebittinger noted. This is annoying for code review because it will mark those lines as changed, even if there were otherwise no substantive changes.
Hypothetical original code
def function_name(arg1,
arg2):
"""docstring
explanation""""
return foo
Then the function name changes
def new_function_name(arg1,
arg2): # this line will annoyingly be marked as changed
"""docstring
explanation""""
return foo
I like 1. I think long function names should rarely exist as they imply a difficult to describe method, which is indicative of methods doing too much and are possibly overly complex. Also not a fan of a large number of arguments, kwargs, descriptive objects and simple methods if possible are easier to test and to read
On Friday, November 8, 2013, adamrp wrote:
Also worth mentioning: the style that aligns lines to the delimiter, e.g. @ElDeveloper https://github.com/ElDeveloper options 3 and 4 can require spacing changes, as @kylebittinger https://github.com/kylebittingernoted. This is annoying for code review because it will mark those lines as changed, even if there were otherwise no substantive changes.
Hypothetical original code
def function_name(arg1, arg2): """docstring explanation"""" return foo
Then the function name changes
def new_function_name(arg1, arg2): # this line will annoyingly be marked as changed """docstring explanation"""" return foo
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-28098625 .
@gregcaporaso @rob-knight, can I please be involved in the new guidelines discussion?
Yes of course.
On Nov 10, 2013, at 12:15 PM, "Daniel McDonald" notifications@github.com<mailto:notifications@github.com> wrote:
@gregcaporasohttps://github.com/gregcaporaso @rob-knighthttps://github.com/rob-knight, can I please be involved in the new guidelines discussion?
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-28157924.
Yes.
I just want to bring up another topic that I'm facing constantly. When writing unit tests, the current guidelines says that we should write a test function for each case that we want to test in a given function. I think this guideline is not currently followed widely...
I feel that we need to change it a bit. I propose to create a TestCase class for each function/class that we want to test, and within this function create all the test functions that we want to test. I think this will improve readability of the tests and it can help us for QIIME 2.0 if we end up by breaking up the tests that we want to run...
During last week's code review, the fact that using asserts to check user input can be dangerous came up. It seems that misusing asserts is a common problem because on of the Python developers recently wrote a longish post on when to use them.
To summarize it, it strongly warns against using them to check argument values for two reasons: i) they raise the wrong exception (a wrong value should raise a ValueError, a wrong type a TypeError...) and ii) they get fully stripped away if python is ever started with the optimization flags.
Asserts should be used for things that are not supposed to fail, not as a shortcut to avoid writing if cond: raise RightException
. Quoting the post author, "Your users should never see an AssertionError; if they do, it's a bug to be fixed".
Grepping the code base shows that many many argument checking is done using asserts, but I think we should try not to add more to validate input. It can be argued that this is cosmetic because we're not really using -O/OO at the moment. Nonetheless, I still feel that raising meaningful errors is important (and that's why we also subclass our own exceptions!).
If possible it would be worth including in this document guidelines for commit messages. GIT has a fairly "strict" format that is sometimes assumed all along the package.
Useful links:
http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
https://wiki.openstack.org/wiki/GitCommitMessages
Note that if you are a VIM user (I think only @cleme and @rnaer do not use VIM consistently) and an OSX user copying the file in /usr/share/vim/vim73/vimrc_example.vim
to ~/.vimrc
will get you some nice formatting for your commit messages when using VIM to write this text i. e. calling git commit
without the -m
option.
:+1:
Nice links! A habit that I like and I've seen on several projects is to start git commits with an acronym that summarises the type of commit. I think it helps make better commits that are easier to review because it discourages monolithic commits that change some white space, fix something and then might even add new stuff.
Following numpy's convention, a bug fix commit would be "BUG: ...", a new feature would be "ENH: ...", etc. By the way, that page also has some tips on git workflows.
http://pycogent.org/coding_guidelines.html#how-should-i-format-my-code should be clarified in regards to what operators should have parenthesis and what shouldn't.
For instance:
exp_freqs = zeros(a)+(n/float(a)) #f_i_hat vals
and
exp_freqs = zeros(a)+n/float(a) #f_i_hat vals
seem equivalent to me. @wasade suggests that the first is better (and it certainly makes the order of operations clearer). we should just add a little bit more text/examples to this documentation section.
PEP8 has a section on this http://www.python.org/dev/peps/pep-0008/#other-recommendations
On Wed, Dec 4, 2013 at 2:57 PM, Will Van Treuren notifications@github.comwrote:
http://pycogent.org/coding_guidelines.html#how-should-i-format-my-codeshould be clarified in regards to what operators should have parenthesis and what shouldn't.
For instance:
exp_freqs = zeros(a)+(n/float(a)) #f_i_hat valsandexp_freqs = zeros(a)+n/float(a) #f_i_hat vals
seem equivalent to me. @wasade https://github.com/wasade suggests that the first is better (and it certainly makes the order of operations clearer). we should just add a little bit more text/examples to this documentation section.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-29839330 .
it is equivalent, but this way you don't have to think about order of operations
On Wed, Dec 4, 2013 at 12:57 PM, Will Van Treuren notifications@github.comwrote:
http://pycogent.org/coding_guidelines.html#how-should-i-format-my-codeshould be clarified in regards to what operators should have parenthesis and what shouldn't.
For instance:
exp_freqs = zeros(a)+(n/float(a)) #f_i_hat valsandexp_freqs = zeros(a)+n/float(a) #f_i_hat vals
seem equivalent to me. @wasade https://github.com/wasade suggests that the first is better (and it certainly makes the order of operations clearer). we should just add a little bit more text/examples to this documentation section.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-29839330 .
By the way, there hasn't been any discussion about docstrings. Since the description in pycogent's guidelines is rather short, I'd like to point out that there exists a docstring format in the scientific python community (numpy, scipy, scikits use it, and matplotlib recently adopted it). Some good things about it are the fact that tools to handle it are already there (the numpydoc sphinx extension) and it suggests clear sections with good support for references.
Regarding parentheses, I personally don't like them in this case (but don't really mind and tend to use them liberally) because i) it's a very simple expression and ii) +
and /
have both very clear priorities. On the other hand, even if nowadays PEP8 is quite flexible regarding spaces around operators (it used to requiere spaces around all operators), this expression could be written as exp_freqs = zeros(a) + n/float(a)
following the spirit of the third example in @kylebittinger's link.
In general, I'd hand all stylistic issues over to PEP8 and focus on issues specific to QIIME in the guidelines document, maybe with an emphasis on the suggestions we're not really following now, like raising exceptions using the syntax raise Error("Error msg")
instead of the old raise Error, "Error msg"
which is invalid syntax in python3 and increases the already high usage of line continuation characters. IMHO, the more that document deviates from standards, the less useful preexistent tools like pep8
are to automate tasks like style checking.
Agree on using PEP 8 as baseline, and add to our guidelines only whatever deviates/is not covered by it.
We should adopt PEP8. The reason we didn't is that the start of coding preceded it, and enough time has elapsed that we should standardize with what the community has adopted as best practices.
On Dec 11, 2013, at 6:56 AM, "Greg Caporaso" notifications@github.com<mailto:notifications@github.com> wrote:
Hi all, I created a PRhttps://github.com/qiime/qiime/pull/1279 where I am starting to work on this. A lot of the discussion we're having here is over things that are covered in PEP 8http://www.python.org/dev/peps/pep-0008/, so I propose that we adopt PEP 8 as our style guide and build from it as necessary. Any reasons why we wouldn't want to adopt PEP 8?
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-30321741.
Agree
On Wednesday, December 11, 2013, Rob Knight wrote:
We should adopt PEP8. The reason we didn't is that the start of coding preceded it, and enough time has elapsed that we should standardize with what the community has adopted as best practices.
On Dec 11, 2013, at 6:56 AM, "Greg Caporaso" <notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.com');> <mailto:notifications@github.com <javascript:_e({}, 'cvml', 'notifications@github.com');>>> wrote:
Hi all, I created a PRhttps://github.com/qiime/qiime/pull/1279 where I am starting to work on this. A lot of the discussion we're having here is over things that are covered in PEP 8< http://www.python.org/dev/peps/pep-0008/>, so I propose that we adopt PEP 8 as our style guide and build from it as necessary. Any reasons why we wouldn't want to adopt PEP 8?
— Reply to this email directly or view it on GitHub< https://github.com/qiime/qiime/issues/1181#issuecomment-30321741>.
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-30323520 .
In cases where PEP8 is ambiguous (i.e., where multiple styles are listed as acceptable), are we going to accept any of those styles, or are we going to choose one in particular that we like best? On Dec 11, 2013 7:28 AM, "Daniel McDonald" notifications@github.com wrote:
Agree
On Wednesday, December 11, 2013, Rob Knight wrote:
We should adopt PEP8. The reason we didn't is that the start of coding preceded it, and enough time has elapsed that we should standardize with what the community has adopted as best practices.
On Dec 11, 2013, at 6:56 AM, "Greg Caporaso" <notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.com');> <mailto:notifications@github.com <javascript:_e({}, 'cvml', 'notifications@github.com');>>> wrote:
Hi all, I created a PRhttps://github.com/qiime/qiime/pull/1279 where I am starting to work on this. A lot of the discussion we're having here is over things that are covered in PEP 8< http://www.python.org/dev/peps/pep-0008/>, so I propose that we adopt PEP 8 as our style guide and build from it as necessary. Any reasons why we wouldn't want to adopt PEP 8?
— Reply to this email directly or view it on GitHub< https://github.com/qiime/qiime/issues/1181#issuecomment-30321741>.
— Reply to this email directly or view it on GitHub< https://github.com/qiime/qiime/issues/1181#issuecomment-30323520> .
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-30324045 .
We will choose one.
On Dec 11, 2013, at 10:02 AM, adamrp notifications@github.com<mailto:notifications@github.com> wrote:
In cases where PEP8 is ambiguous (i.e., where multiple styles are listed as acceptable), are we going to accept any of those styles, or are we going to choose one in particular that we like best? On Dec 11, 2013 7:28 AM, "Daniel McDonald" notifications@github.com<mailto:notifications@github.com> wrote:
Agree
On Wednesday, December 11, 2013, Rob Knight wrote:
We should adopt PEP8. The reason we didn't is that the start of coding preceded it, and enough time has elapsed that we should standardize with what the community has adopted as best practices.
On Dec 11, 2013, at 6:56 AM, "Greg Caporaso" notifications@github.com<mailto:notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.commailto:notifications@github.com');> <mailto:notifications@github.com <javascript:_e({}, 'cvml', 'notifications@github.commailto:notifications@github.com');>>> wrote:
Hi all, I created a PRhttps://github.com/qiime/qiime/pull/1279 where I am starting to work on this. A lot of the discussion we're having here is over things that are covered in PEP 8< http://www.python.org/dev/peps/pep-0008/>, so I propose that we adopt PEP 8 as our style guide and build from it as necessary. Any reasons why we wouldn't want to adopt PEP 8?
— Reply to this email directly or view it on GitHub< https://github.com/qiime/qiime/issues/1181#issuecomment-30321741>.
— Reply to this email directly or view it on GitHub< https://github.com/qiime/qiime/issues/1181#issuecomment-30323520> .
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-30324045 .
— Reply to this email directly or view it on GitHubhttps://github.com/qiime/qiime/issues/1181#issuecomment-30339349.
Should we consider this essentially resolved as #1279 was merged earlier today?
We need updated coding guidelines, as well as a reference page for new developers. This is partially in response to discussion happening in #1177, but is something that is I've been meaning to work on for a while. I'm going to draft this with input from @rob-knight, and all other developers. I'd like to present a complete draft of this document in a special meeting sometime between Dec 11-13 when I visit Boulder.
This will build from the PyCogent coding guidelines and scripting guidelines, and PEP 8.