TEOS-10 / GSW-C

C implementation of the Thermodynamic Equation Of Seawater - 2010 (TEOS-10)
Other
18 stars 17 forks source link

Modify the library structure and style? #21

Closed efiring closed 6 months ago

efiring commented 7 years ago

Working with the C code is more of a pain than it needs to be for at least 2 stylistic reasons:

1) The indentation is a mix of tabs and spaces, which causes Python people like me to shudder.

2) Having both the single-file, which is compiled into the library, and the separate files for each function, which are concatenated to generate the single file, makes code modification and additions unwieldy; compilation errors and warnings refer to the single file, but all editing must be done on the individual files.

Is there support for fixing either or both of these aspects? I would be inclined to keep the separate files and ditch the concatenation.

dankelley commented 7 years ago
  1. I agree that combining tabs and spaces is problematic. I'm guessing that the tabs are there because someone used indent on the code.

  2. Since I was confused on the single- vs multiple-file system, I definitely think it ought to be one or the other, but not both. Probably the most conventional approach is to have separate files, so I'd vote for that scheme.

dankelley commented 7 years ago

One more thing -- I am guessing that the code has tabs in some places and spaces in others because parts of the code were passed through indent. I've not used indent routinely in many years, but possibly it might be useful in getting all the code to be stylistically similar, if that's desired. (I am not suggesting it's a big factor however -- I'd prefer that git blame would tell me that a line of code is 5 years old, instead of 5 weeks because it got re-indented recently with new settings.)

efiring commented 7 years ago

Actually, the most likely cause of tabs + spaces is use of an editor with 8-space hard tabs, and wanting to have some 4-space indentations. I think it is mostly tabs, but I haven't looked carefully. I detest hard tabs. I agree that whitespace changes are a royal pain when it comes to using git blame, so mainly I am looking for a policy for blocks of code that are substantially changed. Maybe there won't be many for a long time, if ever.

efiring commented 7 years ago

Regarding point 2, I think that ditching the single file might be more work than one might expect, but I haven't looked at it carefully. One motivation for doing so anyway is that there are just a few parts of the C code that cannot be compiled with MSVC. Increasingly, Python extension code for Windows must be compiled with MSVC, unfortunately, so that it will work with the rest of the Python system, which usually is compiled with MSVC on Windows. This meant that I had to convert those chunks to C++ for GSW-Python, which in turn means that the entire toolbox is in two versions, one for Windows and one for everything else. Without the concatenated toolbox file I could maintain the few C++ files in GSW-C itself, and the library build machinery could use the C or the C++ versions as appropriate. It would make that codebase a little messier, but there might be other users who would benefit from being able to build the library with MSVC. (I don't use Windows myself, though, so working on the Windows-specific aspects is a pain.)

dankelley commented 7 years ago

Interesting, re MSVC. Well, for GSW-R, it's a bit cleaner to have just a single .c file, I suppose, so I'm pretty neutral on that.

ocefpaf commented 7 years ago

FYI we can se the msys2 toolchain on Windows and build with a single .c file via mingw in the same fashion as R does, but that is frowned upon for reason unclear to me. I'll investigate it and maybe we can remove that MSVC as a constraint here.

efiring commented 7 years ago

I'm pretty sure that mingw still does not work satisfactorily for all Windows and Python versions. There was an effort to fix that, but I don't think it ever went to completion.

ocefpaf commented 7 years ago

I think the issue is only for Fortran extension. I'm looking into it.

DocOtak commented 6 years ago

I decided to take a crack at this since I'm working on libraries/code that will make use of the python version and would like to see missing (matlab) functions added to the upstream library. I'm also fresh off a long cruise... So here is a branch: https://github.com/DocOtak/GSW-C/tree/restructure

Important things to know:

I tried to find a "minimal example" that could demonstrate how we might restructure the project to make it easier to work with. I chose gsw_z_from_p and its dependency gsw_enthalpy_sso_0. I also added a function not in the existing c toolbox gsw_o2sol_sp_pt.

The restructure I implemented was the following:

Other notes:

I intend this to be a starting point for discussion but I'm willing to restructure the entire library, however I would need guidance so I don't mess up folks downstream. Please let me know what you think.

dankelley commented 6 years ago

Hi @DocOtak and the others. This is the sort of thing where an internet "meet up" would be great, I think.

The C code is used within the GSW-R project (which builds the R package named "gsw", in turn used by the more general R package named "oce"). Because of this, I thought it would make sense to comment here. First, my perspective: I've been programming in C since the 1980s, and have picked up some opinions along the way. Everything I offer below is coming from a perspective of having been bitten by code!! And -- here is where video meetup would be useful -- I really want to emphasize that all of what I'm about to say is meant respectfully.

Moving from the present context to much more general feelings, I am against refactoring code that works. This is why, although I share with @efiring a dislike for some of the indentation in the C code, I also share his view (https://github.com/TEOS-10/GSW-C/issues/21#issuecomment-323816586) that changes to indentation should be reserved for new code fragments. (My concern is mainly that we lose git-blame tracking of changes.)

Early in this thread, it was mentioned that the matlab source is divided up on a per-function basis. I suppose that could be a motivation for making the C code be divided up similarly, if we have a documented and open-source toolchain that converts between the two languages. (I am hazy on the history, and think that the chain was actually matlab -> fortran -> C, and if that is right, then I think the C ought to mimic the fortran in terms of file setup.)

Apart from the goal of connecting the C clearly to the matlab, I don't see any real advantage in breaking the C up in a per-function basis. A good text editor makes it very easy to find functions within a single file.

As for breaking the .h file up into smaller units, that is not consistent with typical C practice, and will be quite impractical in the common use case of a C program that works with a built-up library (and that benefits from a single .h file). For example, the GSW-R code has a wrapper to interface from the built-up C library to R functions, and it has a single

#include "gswteos-10.h"

line to set up the compiler to "know about" the arg/return types for each of about 200 GSW functions. If we get a per-function breakdown of .h files, the line I quote above would have to be replaced by 200 or so #include directives. Obviously, I don't mind doing that in the code (it's a simple ls -c1 ... | sed sort of operation to create the lines to be inserted into the C code) but it would look quite odd, in the C way of expressing methods.

Given the above, I suggest that efforts not be directed to refactoring the C code. I think it makes more sense to work with the larger team to get a clear toolchain set up to translate the matlab code into Fortran and C. After that, i.e. once there are built-up libraries, inclusion of algorithms into python, C, and the rest will be easy, likely only requiring minor adjustments over time, as the underlying matlab algorithms (and, importantly, the check values) change.

Let me end by emphasizing that I mean all of the above respectfully. In person, this whole mess of typing would have been a few words at the pub, followed by plans for work along other lines, and by whatever else goes on at pubs! And all with laughs and fellowship. I hope my remarks are taken in that context.

Dan.

efiring commented 6 years ago

Cross-ref: https://github.com/TEOS-10/GSW-Python/pull/28.

efiring commented 6 years ago

23 eliminates the toolbox directory.

DocOtak commented 6 years ago

Hi @dankelley Thank you for your feedback and rest assured that it was received as if over a beverage of choice at a pub.

It seems the desire now is to have a single file given #23, I'll wait for action on this. If there are some agree upon code/library structure for this library, it might be useful to add a CONTRIBUTING file somewhere (root?). For example, it appears that ported Matlab functions should appear in alphabetical order by function name, and the code style appears to follow the GNU style guide.

efiring commented 6 years ago

With #23 merged, the biggest change along these lines has been made. Some smaller changes remain to be considered.

ocefpaf commented 6 months ago

This issue is quite old and most of the points here were addressed. Let's close and open new one for the smaller changes that remain to be considered.