Use dict.get to retrieve values of the persons data base.

Alberth289346 commented 1 month ago

A few code changes:

dict.get is useful for retrieving values from a dictionary without knowing it they exist.
Not sure if the advice is still valid for Python 3, but the advice was to use join for creating strings from several pieces. The reason is that strings cannot be modified in-place. So, unless the interpreter is smart enough to recognize that it can break the string rules safely in the code, every += makes a full copy of the already created string.
Don't append a final \n in __repr__ (or __str__ for that matter). Let the caller decide where to break the line. This may break code somewhere, not sure where it is used. Note that print() appends a \n as well in the output.
Add a docstring for the Persons class. It gives room to describe the goal of the class, as well as the main class fields. In general, functions/methods should also be documented. While it can be done as a mass-addition, it's boring work, so it also possible to do this as needed.
I somewhat randomly used a document style. Likely a choice should be made for something common.
- I briefly looked for class descriptions at https://realpython.com/documenting-python-code/ which hardly touches classes.
- https://peps.python.org/pep-0257/ exists but doesn't touch classes at all. It does have the "--" idea with an additional line above it.
- https://stackoverflow.com/questions/3051241/how-to-document-class-attributes-in-python (self.x variables are named class attributes, apparently).
- https://stackoverflow.com/questions/6060813/how-to-document-fields-and-properties-in-python states (1st answer) that the answer is in pep 257 and gives some examples. Not sure I like that, but it's your call.

davbeek commented 1 month ago

Thanks Albert for your improvements and added docstring. I once tried to add a docstring, but Copilot showed me three formats that I had to choose from, probably the same three that you found.

Problem was that I could not immediately see what the consequences and advantages/disadvantages were of the three formats, so I could not make a choice. Will need some time now, to make a proper choice.

Alberth289346 commented 1 month ago

Problem was that I could not immediately see what the consequences and advantages/disadvantages were of the three formats, so I could not make a choice. Will need some time now, to make a proper choice.

The main purpose is to describe intentions, goals, and relevant rules of classes and functions. Basically it's an independent, high-level description of what the code is supposed to be doing.

By having a high level description, you can read in 4 lines of text what 80 lines of code is doing at a much lower level of detail. That makes it much quicker to find the function you're looking for.
The description should be precise with respect to input/output. From the text, you should be able to decide what arguments must be given to the call. It should then also be clear what answers you can expect.

The text and code should match with each other. This is a second use of the documentation. Instead of just code, the documentation provides an independent second source of truth. That means you can compare them and possibly find discrepancies.
When they don't match, something is wrong, and usually the code should be fixed. The reason for that is that calls are generally created by reading the documentation rather than by checking the code.

As for the format:

mostly all formats contain the same information so that isn't very relevant.
layout is usually a little different.
a popular use of the documentation is reading the information by machine and then generate html or pdf or ... pages of it, eg https://docs.python.org/3/library/index.html I think it's mostly useful for published libraries or frameworks, and not much for applications.
processing the documentation by a machine is also done while editing the code. The editor then pops up the description when you put the cursor on a function name. This can be useful, depending on how much you know about the code. Usually, editors have specific documentation format rules. So this might be the main driving force to decide what documentation layout you want.

Python has defined a standard for it, and it is likely that common editors will at least support that format.

I never looked into that in detail though. Our approach was to simply open the source code at the right spot. However, at that time, editors that provided assistance like we have today, didn't exist at all :)

davbeek commented 1 month ago

I am still figuring out the GitHub UI for pull requests and which browser to use for compatibility. Never dealt with pull request so far in GitHub.

Alberth289346 commented 1 month ago

I am still figuring out the GitHub UI for pull requests and which browser to use for compatibility. Never dealt with pull request so far in GitHub.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/reviewing-proposed-changes-in-a-pull-request ?

I never do a review as described at point 10, and always use the more informal "add a comment" instead.

davbeek commented 1 month ago

Actually I am beginning to wonder if docstrings can do anything that normal comments can't do better. I know you often see docstrings in code, but compared to normal comments they just seem to add duplication. Parameter declarations with type hints are already there in functions, so why not add the comments there.

When I want to know how a method works, I usually just press F12, which opens a new tab on the method definition. The hover functionality which generates info is usually not very helpful. Also the docs generated by sphinx using docstrings that I have read so far were not very helpful.

Alberth289346 commented 1 month ago

I know you often see docstrings in code, but compared to normal comments they just seem to add duplication.

They have different purposes.

Docstrings are about the interface of the function. What does it do, what goes in, what comes out.

Comments are for explaining code details, in particular anything that is not immediately clear from the code itself. Eg the additional " " in the log call. The code shows it's done, but that doesn't explain why it must be there. A comment about multiple formats then clarifies that.

Similarly for the additional "" in the Windows 'open file' subprocess.run call. I didn't know it inserted an argument for the window title, so the comment helped me in understanding that the code is correct.

You may have noticed I also add comment "header lines" above blocks of code. That is not about code details but about navigation in the code instead. It describes in 1 line what the code block below it is doing.
The reason for having that is finding code again.

You have a function that achieves something, and you want to know in detail how it does a particular step in that. The "something" implies the function must have that step, but where in the 200 lines of code is it?

Without comment headers, you must read all code from the top of the function one line at a time, and decide for each line whether it could relate to "the step". That is a lot of work especially when you don't have knowledge about the variables and functions that it uses. You must do it carefully or you fail to find the step and must do it all again but even more carefully.

By adding "header comment" lines, you can read in one line of text what the entire block of code below it is doing. It's easier to decide whether the code block touches "the step" (1 line of high level text beats knowing everything about the variables and functions that the code block uses).
If the text makes clear the code block is not about the step I am looking for, I can move down to the next comment header, etc. In that way I can globally understand how the entire function works in a handful of text lines, and I can find the code of the step I am interested in, in a fraction of the time I'd need otherwise.

When I arrived at the code block I am interested in, I then start reading the code in full detail until I find what I was looking for.

Alberth289346 commented 1 month ago

When I want to know how a method works, I usually just press F12, ...

This doesn't scale and invites for abusing implementation details.

We can handle about 7 different things at the same time. A sufficiently complicated piece of code is often more than that, so we need all our brain capacity in order to avoid making errors in the code that we are writing.

Opening functions that you just need to use, and reading how they work with F12 eats capacity for solving the above coding problem. Depending on how complicated these functions are and how many you have, the less brain capacity you use for the new code. If you have more than about 7 such functions, you're essentially constantly task swapping, and must thus also read the code you just wrote to remember what to do next.

By reading how the F12 function works, you inevitably pick up how it behaves in edge cases. Because you know, it's tempting to use that knowledge in the new code as well. That however makes hard connections between the current implementation of an already written function and the code you're writing now.
As a result, if you ever change the implementation of the F12 function, other code may die because it relied on implementation details in the former function.

A good docstring should be about the interface only, and describe what it does rather than how.
If you use that instead for writing the new code, it's much less likely that implementation exploits sneak into the new code simply because the doc string doesn't tell they exist.
That also makes changing an existing function much less hazardous. The docstring defines the promises to the callers. The implementation of the function may do anything except break those published promises.
In other words, changing an existing function should never cause trouble elsewhere as long as you check that the promises in the doc string are still valid after your changes.

davbeek commented 1 month ago

First, we agree on your so called "comment header lines". The only question that remains is whether the comment header lines are docstrings or normal comments.

Second, the problem I have is with the input and output interface describing the parameters. I don't see why these would need to be duplicated in docstrings, when they are already there in the function definition. So the parameter declaration part of the function definition would probably be my preferred place of commenting the interface, if needed.

Third, defining good docstrings or comments takes a lot of time. Time that can also be spent on improving code quality. So in theory, you are completely right. In practice, though, I have only limited time to spend and need to be careful about where I spend my precious time. Especially when code is not stable and still changes a lot, spending much time on documentation that a few weeks later can be thrown in the bin because of code refactoring can be a big waste of time.

I would rather spend time on improving code quality or working on improving the other tools of the BOOST project instead of writing beautiful documentation on non-optimal code.

davbeek / gitinspectorgui

Use dict.get to retrieve values of the persons data base. #1