SuperDARN / rst

Radar Software Toolkit (RST)
https://superdarn.github.io/rst/
GNU General Public License v3.0
22 stars 17 forks source link

[Bug]: Compilation failure due to RSTLite remaining in PATH #155

Closed asreimer closed 6 years ago

asreimer commented 6 years ago

edit: I'm using the master branch. edit2: Same problem on both master and develop branches. edit3: See comment: https://github.com/SuperDARN/rst/issues/155#issuecomment-388658524 and onward.

Summary

I have found 2 bugs that affect compiling rst on systems that care about case sensitivity in header file names: 1) A very old bug in makeall.c that prevents the code from finding the latest version of a library for some directory structures. This resulted in the header for astalg.1.5 to be copied into rst/include and for astalg.1.5 to be compiled instead of astalg.2.0... Since Fedora 26 is case sensitive on header files names and astalg.1.5 header is AstAlg.h compared to 2.0 which has astalg.h, a compilation error was thrown. 2) A bug introduced by @sshepherd when they tried to remove .rst directories for passing version numbers to the build system. Something else, beyond the changes to get.version, seem to need to be done because every library in rst/lib has a x.x version number...

These bugs are sort of intertwined, so I've reported them together. I can make separate issues for each one, but at this point, I don't know the exact cause of both of them and need some help debugging.

You likely won't be able to reproduce the first bug on your own system if you don't start from a fresh clone of rst. You might not even be able to reproduce this outside of Fedora. You should be able to reproduce bug 2 though.

Symptoms

On Fedora 26, the make.code installation step seems to fail due to case sensitivity for header file names in the rst/codebase/analysis/src.lib/igrf_v2/igrf.1.0/src/igrflib.c file. I say seems here for reasons that will be explained.

Here's the error message I received when the compilation fails:

================================================================================
Compiling Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/igrf_v2/igrf.1.0
================================================================================
make clean
rm -f *.o
make
make.version /home/ashtonsethreimer/rst/codebase/analysis/src.lib/igrf_v2/igrf.1.0/src/..
make.hdr /home/ashtonsethreimer/rst/codebase/analysis/src.lib/igrf_v2/igrf.1.0/src/../include /home/ashtonsethreimer/rst/include/analysis 
/home/ashtonsethreimer/rst/codebase/analysis/src.lib/igrf_v2/igrf.1.0/src/../include/igrflib.h
'/home/ashtonsethreimer/rst/codebase/analysis/src.lib/igrf_v2/igrf.1.0/src/../include/igrflib.h' -> '/home/ashtonsethreimer/rst/include/analysis/igrflib.h'
cc -fPIC -Wall -O3 -ansi -D_GNU_SOURCE -D_LINUX -I/home/ashtonsethreimer/rst/include/base -I/home/ashtonsethreimer/rst/include/analysis -c igrflib.c    
igrflib.c:8:10: fatal error: astalg.h: No such file or directory
 #include "astalg.h"
          ^~~~~~~~~~
compilation terminated.
make: *** [/home/ashtonsethreimer/rst/build/make/makelib.linux:19: igrflib.o] Error 1
Compilation Aborted.

Case sensitivity is dependent on system configuration (see: https://stackoverflow.com/questions/1951951/when-including-header-files-is-the-path-case-sensitive/37228442#37228442).

If you just stopped here, you would think that the issue is just that we need to rename the rst/include/analysis/AstAlg.h file to astalg.h or something like that. But of course the whole rst/include directories are generated by make.code (specifically, when it calls make.hdr), so this begs the question, "why did make.code not find the right astalg header?"

We must go deeper

There's actually no meaningful difference between the astalg.1.5 and astalg.2.0 header files except for some spaces... Here's the diff of the two:

1c1
< /* Astalg.h
---
> /* astalg.h
16,18c16,18
<      4201 Wilson Blvd,
<      Arlington, VA 22230
<      email: kbaker@nsf.gov
---
>      4201 Wilson Blvd,
>      Arlington, VA 22230
>      email: kbaker@nsf.gov
80,85c80,85
<            int *year,
<            int *month,
<            int *day,
<            int *hour,
<            int *minute,
<            int *second);
---
>        int *year,
>        int *month,
>        int *day,
>        int *hour,
>        int *minute,
>        int *second);

So this might explain why no one else has encountered the installation bug until now. The AstAlg.h is from version 1.5 of the astalg library and astalg.h is from version 2.0, but there is no meaningful different between the files so a case-insensitive compile won't throw an error.

Potential Cause

I think the problem here is a very old bug in the makeall.c code. I found that by creating an astalg_v2 directory and putting the astalg.2.0 directory into it, the make.code (which calls makeall) was able to find it and compile it just fine. Digging in to this, I think it's a long-standing bug in makeall.c. Specifically something to do with how makeall.c detects libraries and determines the latest version to compile. For example, the first few lines of makeall -vb -t hdr ~/rst/codebase/analysis/src.lib:

makeall
Package Source Directory:/home/ashtonsethreimer/rst/codebase/analysis/src.lib
Abort on failure:yes
Compile most recent version of code only:yes
================================================================================
Located Source Code:
1.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/aacgm/aacgm.1.15
2.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/aacgm_v2/aacgm.1.0
3.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/astalg/astalg.1.5
4.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/cdf/rcdf.1.5
5.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/geopack/geopack.1.4
6.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/idl/idlsave.1.2
7.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/igrf/igrf.1.13
8.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/igrf_v2/igrf.1.0
9.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/mlt/mlt.1.4
10.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/mlt_v2/mlt.1.0
11.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/mpfit/mpfit.1.2

You can see that it incorrectly finds that astalg.1.5 is the more recent version of astalg, when 2.0 > 1.5...

Next you can try running makeall -a -vb -t hdr ~/rst/codebase/analysis/src.lib and you'll see something like:

3.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/astalg/astalg.1.5
4.Located Library:/home/ashtonsethreimer/rst/codebase/analysis/src.lib/astalg/astalg.1.5

which is funny. It's detecting astalg.1.5 and astalg.2.0 but returning 1.5 twice... Thanks makeall. Great job.

Proposed Solutions

From what I can see, there are 3 ways to solve the problem with AstAlg.h vs astalg.h A) Figure out what the underlying issue in the build system is where it cannot properly find both astalg.1.5 and astalg.2.0... I think this amounts to fixing makeall.

B) Leave the bug in makeall and instead just remove astalg1.5 entirely. I did a simple grep searching for AstAlg.h and didn't find any code expect astalg1.5 that requires it. This doesn't mean there isn't software that links to the libastalg1 library though. Is there? Can we just use astalg.2.0?

C) Leave the bug in makeall and instead do the same thing for astalg as was done for mlt, aacgm, igrf, etc, specifically: 1) make a rst/codebase/analysis/src.lib/astalg_v2 2) move rst/codebase/analysis/src.lib/astalg/astalg.2.0 into rst/codebase/analysis/src.lib/astalg_v2

Just to get rst working for me for now, I've implemented C) locally. I would vote for option A) as a proper fix though.

Another bug I found along the way...

After implementing solution C), as I said above, I encountered another problem.

@sshepherd made a change to the rst/build/scripts/get.version file in this commit. The intent of this commit was to remove the need for the .rst/version files that the make scripts all use to determine the version of the library that is being make. However, the changes made don't go far enough because the make file only makes a library with version 'x.x', unless I add the .rst/version file with contents version.2.0 in it into the rst/codebase/analysis/src.lib/astalg_v2/ directory.

If I don't add the .rst/version file, then I get the following compile error:

================================================================================
Compiling Binary:/home/ashtonsethreimer/rst/codebase/analysis/src.bin/aacgm/aacgmeval.1.2
================================================================================
make clean
rm -f *.o
rm -f version.h
rm -f errstr.h
rm -f hlpstr.h
rm -f aacgmeval
make
make.version /home/ashtonsethreimer/rst/codebase/analysis/src.bin/aacgm/aacgmeval.1.2
make.help
./doc/aacgmeval.doc.xml
cc -fPIC -Wall -O3 -ansi -D_GNU_SOURCE -D_LINUX -D_SVGLIB_ -I/home/ashtonsethreimer/rst/include/base -I/home/ashtonsethreimer/rst/include/general -I/home/ashtonsethreimer/rst/include/analysis -c aacgm.c    
aacgm.c: In function ‘main’:
aacgm.c:90:7: warning: variable ‘arg’ set but not used [-Wunused-but-set-variable]
   int arg;
       ^~~
mkdir -p /home/ashtonsethreimer/rst/bin
cc -L/home/ashtonsethreimer/rst/lib -o /home/ashtonsethreimer/rst/bin/aacgmeval aacgm.o -Wl,-Bstatic \
                           -laacgm.1 -lopt.1 -lrtime.1 -laacgm_v2.1 -ligrf_v2.1 -lastalg.1  -Wl,-Bdynamic \
                           -lm   
/home/ashtonsethreimer/rst/lib/libastalg.1.a(astalglib.o): In function `AstAlg_dday':
astalglib.c:(.text+0x0): multiple definition of `AstAlg_dday'
/home/ashtonsethreimer/rst/lib/libastalg.1.a(AstAlg_dday.o):AstAlg_dday.c:(.text+0x0): first defined here
/home/ashtonsethreimer/rst/lib/libastalg.1.a(astalglib.o): In function `AstAlg_jde':
astalglib.c:(.text+0x1b0): multiple definition of `AstAlg_jde'
/home/ashtonsethreimer/rst/lib/libastalg.1.a(AstAlg_jde.o):AstAlg_jde.c:(.text+0x0): first defined here
/home/ashtonsethreimer/rst/lib/libastalg.1.a(astalglib.o): In function `AstAlg_solar_declination':
astalglib.c:(.text+0xaf0): multiple definition of `AstAlg_solar_declination'
/home/ashtonsethreimer/rst/lib/libastalg.1.a(AstAlg_solar_declination.o):AstAlg_solar_declination.c:(.text+0x0): first defined here
/home/ashtonsethreimer/rst/lib/libastalg.1.a(astalglib.o): In function `AstAlg_solar_right_ascension':
astalglib.c:(.text+0xb80): multiple definition of `AstAlg_solar_right_ascension'
/home/ashtonsethreimer/rst/lib/libastalg.1.a(AstAlg_solar_right_ascension.o):AstAlg_solar_right_ascension.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
make: *** [/home/ashtonsethreimer/rst/build/make/makebin.linux:43: aacgmeval] Error 1
egthomas commented 6 years ago

@asreimer - Thanks for the detailed description of the issues you've found. Do you think you could provide a list of RST installation dependencies for the version of Fedora you're using (similar to the readme or .travis.yml files) to help us with the debugging? That way we could also add them to the readme to help future users on Fedora systems.

Just to be clear on your second bug, in the rst/lib directory you're seeing files like libaacgm.x.x.a and libaacgm.x.x.so? Or were you simply using x.x as generic placeholders? The screenshot below is a partial view of the lib directory on an Ubuntu installation of RST:

screenshot 2018-05-13 at 4 07 31 pm

Are the version.h files storing the correct MAJOR_VERSION and MINOR_VERSION values for each library/binary? Or do those also say X and X?

egthomas commented 6 years ago

Of course now that I look at my own screenshot, something screwy happened with the libfitacf... files.

asreimer commented 6 years ago

@egthomas is your screenshot from after running with a fresh clone of the git repo? rst/lib and rst/include are not cleaned up by make.code before building. Additionally, existing version.h are not cleaned up either. Can you try deleting/renaming a version.h for one of the libs and see if you still get the correct libname.x.x.so that you expect?

Here's a screenshot of what I get: image As you can see, I was not using x.x as placeholders. I literally get nothing but x.x for version numbers... This is because the version.h files are all storing X and X for the major and minor numbers. The version.h files are not being generated correctly.

I see you have both astalg1 and astalg2 in the list of files you posted. I'm curious to see if you delete rst/lib and then run make.code if they both show up or not. I suspect they will not. This will confirm whether or not this issue is Fedora specific or not. I might beat you to is by spinning up an Ubuntu VM...

I don't have a fresh install of Fedora running right now so I don't have a full list of dependencies. I was working on my machine where I also have RSTLite installed, so most of the dependencies were already present. All I have is: 1) ncurses-devel 2) netcdf-devel 3) and of course the latest copy of the cdf library installed in /usr/local/cdf

I can work on getting a full list, but that's very low priority for me right now with the SuperDARN workshop coming up.

asreimer commented 6 years ago

Ah, I wasn't clear earlier: I'm using the master branch.

edit: develop yields the same results on Fedora.

asreimer commented 6 years ago

I did some more reading this morning. Case sensitivity of header files is pretty much a Linux wide phenomenon due to the fact that most Linux OSes use case sensitive file systems... So this should happen on Ubuntu too.

But it doesn't. I just tried installing rst on my Ubuntu 14.04 machine and it correctly figures out the version numbers for libraries. It even correctly determines that astalg.2.0 is the most recent astalg and only tries to compile that one.

So there's something different about Fedora 26 and Ubuntu 14.04 (@egthomas was it 16.04 you tried?). I'm pretty sure that something different is central to the makeall.c code.

asreimer commented 6 years ago

Maybe this is related to a change in gcc?

This works with gcc 4.8.4-2ubuntu1~14.04.3, but not with gcc 7.2.1 20170915 (Red Hat 7.2.1-2).

What gcc are you using @egthomas

egthomas commented 6 years ago

@asreimer I'm using Ubuntu 16.04 and gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609.

On a fresh install of the master branch, I changed the name of the rprmidl.1.6 library to rprmidl.y.z and found the following (expected) library files in rst/lib: librprmidl.y.z.a, librprmidl.y.z.so, librprmidl.1.a, and librprmidl.1.so (the latter two of which are just symbolic links to the first two files).

Can you try deleting lines 65-82 of get.version? That portion of the code shouldn't be used anymore, but at the moment that's the only place I'm finding the x.x version numbers being set as a default.

asreimer commented 6 years ago

Ok, this is hilarious:

image

So it is failing because the get.version from my RSTLite installation is being used. I guess I didn't clean up all of the environment variables that I thought I did...

egthomas commented 6 years ago

You're also sure you don't have any relics from RSTLite in your environment variables / settings / paths, right?

edit: just saw your comment!

asreimer commented 6 years ago

It's interesting that we both converged on the exact same realization at the almost the same time.

egthomas commented 6 years ago

Indeed! So does that clear up your compilation issues? Or are some problems still remaining?

About the installation dependencies, that's nothing urgent, just something that would be useful to include in the readme at some point.

asreimer commented 6 years ago

So this is interesting. Despite removing all references to RSTLite in my .bashrc and .bash_profile (contrary to opinion, I'm not a complete idiot, I did this long before trying to install rst) and yet RSTLite is still on my PATH...

It is safe to say that this is not a bug with rst nor does it have anything to do with Fedora vs. Ubuntu.

Thanks @egthomas for responding. Your comments helped me figure out what was going on.

When I get some time, I'll get the Fedora dependencies sorted and make a pull request for the README.md file.