Closed efiring closed 6 months ago
I agree that combining tabs and spaces is problematic. I'm guessing that the tabs are there because someone used indent
on the code.
Since I was confused on the single- vs multiple-file system, I definitely think it ought to be one or the other, but not both. Probably the most conventional approach is to have separate files, so I'd vote for that scheme.
One more thing -- I am guessing that the code has tabs in some places and spaces in others because parts of the code were passed through indent
. I've not used indent
routinely in many years, but possibly it might be useful in getting all the code to be stylistically similar, if that's desired. (I am not suggesting it's a big factor however -- I'd prefer that git blame
would tell me that a line of code is 5 years old, instead of 5 weeks because it got re-indented recently with new settings.)
Actually, the most likely cause of tabs + spaces is use of an editor with 8-space hard tabs, and wanting to have some 4-space indentations. I think it is mostly tabs, but I haven't looked carefully. I detest hard tabs. I agree that whitespace changes are a royal pain when it comes to using git blame, so mainly I am looking for a policy for blocks of code that are substantially changed. Maybe there won't be many for a long time, if ever.
Regarding point 2, I think that ditching the single file might be more work than one might expect, but I haven't looked at it carefully. One motivation for doing so anyway is that there are just a few parts of the C code that cannot be compiled with MSVC. Increasingly, Python extension code for Windows must be compiled with MSVC, unfortunately, so that it will work with the rest of the Python system, which usually is compiled with MSVC on Windows. This meant that I had to convert those chunks to C++ for GSW-Python, which in turn means that the entire toolbox is in two versions, one for Windows and one for everything else. Without the concatenated toolbox file I could maintain the few C++ files in GSW-C itself, and the library build machinery could use the C or the C++ versions as appropriate. It would make that codebase a little messier, but there might be other users who would benefit from being able to build the library with MSVC. (I don't use Windows myself, though, so working on the Windows-specific aspects is a pain.)
Interesting, re MSVC. Well, for GSW-R, it's a bit cleaner to have just a single .c
file, I suppose, so I'm pretty neutral on that.
FYI we can se the msys2
toolchain
on Windows and build with a single .c
file via mingw
in the same fashion as R does, but that is frowned upon for reason unclear to me. I'll investigate it and maybe we can remove that MSVC as a constraint here.
I'm pretty sure that mingw still does not work satisfactorily for all Windows and Python versions. There was an effort to fix that, but I don't think it ever went to completion.
I think the issue is only for Fortran extension. I'm looking into it.
I decided to take a crack at this since I'm working on libraries/code that will make use of the python version and would like to see missing (matlab) functions added to the upstream library. I'm also fresh off a long cruise... So here is a branch: https://github.com/DocOtak/GSW-C/tree/restructure
Important things to know:
I tried to find a "minimal example" that could demonstrate how we might restructure the project to make it easier to work with. I chose gsw_z_from_p
and its dependency gsw_enthalpy_sso_0
. I also added a function not in the existing c toolbox gsw_o2sol_sp_pt
.
The restructure I implemented was the following:
/src
for the *.c files, basically the contents of /toolbox
/include
for the *.h files, I decided that each .c file should have its own .h and the gswteos-10.h
#include
all the others for the "external API"./data
for the lookup tables, right now it only has the netCDF file in there but probably could contain the other "large" derivative data files./legacy
, this isn't part of the proposed restructure.Other notes:
/lib
subdir for the library build output, but I haven't implemented this yet.gsw_z_from_p.c
file to #include
the dependent function gsw_enthalpy_sso_0
. I personally like this as it gives clarity as to what methods depend on others.I intend this to be a starting point for discussion but I'm willing to restructure the entire library, however I would need guidance so I don't mess up folks downstream. Please let me know what you think.
Hi @DocOtak and the others. This is the sort of thing where an internet "meet up" would be great, I think.
The C code is used within the GSW-R project (which builds the R package named "gsw", in turn used by the more general R package named "oce"). Because of this, I thought it would make sense to comment here. First, my perspective: I've been programming in C since the 1980s, and have picked up some opinions along the way. Everything I offer below is coming from a perspective of having been bitten by code!! And -- here is where video meetup would be useful -- I really want to emphasize that all of what I'm about to say is meant respectfully.
Moving from the present context to much more general feelings, I am against refactoring code that works. This is why, although I share with @efiring a dislike for some of the indentation in the C code, I also share his view (https://github.com/TEOS-10/GSW-C/issues/21#issuecomment-323816586) that changes to indentation should be reserved for new code fragments. (My concern is mainly that we lose git-blame tracking of changes.)
Early in this thread, it was mentioned that the matlab source is divided up on a per-function basis. I suppose that could be a motivation for making the C code be divided up similarly, if we have a documented and open-source toolchain that converts between the two languages. (I am hazy on the history, and think that the chain was actually matlab -> fortran -> C, and if that is right, then I think the C ought to mimic the fortran in terms of file setup.)
Apart from the goal of connecting the C clearly to the matlab, I don't see any real advantage in breaking the C up in a per-function basis. A good text editor makes it very easy to find functions within a single file.
As for breaking the .h file up into smaller units, that is not consistent with typical C practice, and will be quite impractical in the common use case of a C program that works with a built-up library (and that benefits from a single .h file). For example, the GSW-R code has a wrapper to interface from the built-up C library to R functions, and it has a single
#include "gswteos-10.h"
line to set up the compiler to "know about" the arg/return types for each of about 200 GSW functions. If we get a per-function breakdown of .h files, the line I quote above would have to be replaced by 200 or so #include
directives. Obviously, I don't mind doing that in the code (it's a simple ls -c1 ... | sed
sort of operation to create the lines to be inserted into the C code) but it would look quite odd, in the C way of expressing methods.
Given the above, I suggest that efforts not be directed to refactoring the C code. I think it makes more sense to work with the larger team to get a clear toolchain set up to translate the matlab code into Fortran and C. After that, i.e. once there are built-up libraries, inclusion of algorithms into python, C, and the rest will be easy, likely only requiring minor adjustments over time, as the underlying matlab algorithms (and, importantly, the check values) change.
Let me end by emphasizing that I mean all of the above respectfully. In person, this whole mess of typing would have been a few words at the pub, followed by plans for work along other lines, and by whatever else goes on at pubs! And all with laughs and fellowship. I hope my remarks are taken in that context.
Dan.
Cross-ref: https://github.com/TEOS-10/GSW-Python/pull/28.
Hi @dankelley Thank you for your feedback and rest assured that it was received as if over a beverage of choice at a pub.
It seems the desire now is to have a single file given #23, I'll wait for action on this. If there are some agree upon code/library structure for this library, it might be useful to add a CONTRIBUTING file somewhere (root?). For example, it appears that ported Matlab functions should appear in alphabetical order by function name, and the code style appears to follow the GNU style guide.
With #23 merged, the biggest change along these lines has been made. Some smaller changes remain to be considered.
This issue is quite old and most of the points here were addressed. Let's close and open new one for the smaller changes that remain to be considered.
Working with the C code is more of a pain than it needs to be for at least 2 stylistic reasons:
1) The indentation is a mix of tabs and spaces, which causes Python people like me to shudder.
2) Having both the single-file, which is compiled into the library, and the separate files for each function, which are concatenated to generate the single file, makes code modification and additions unwieldy; compilation errors and warnings refer to the single file, but all editing must be done on the individual files.
Is there support for fixing either or both of these aspects? I would be inclined to keep the separate files and ditch the concatenation.