Open surak opened 2 years ago
@surak I fully agree that we should do a lot better here. We should make an honest attempt to fish out the actual error message from the log, rather than only showing the first 300 characters, since that's indeed usually useless.
I didn't actually realize that the location of the log file isn't printed at the end at all, that's definitely an easy change to make.
For me it's not a problem because I have the use of eb --last-log
deeply engraved into my muscle memory (see also https://easybuilders.github.io/easybuild-tutorial/2021-isc21/troubleshooting/#last-log).
I have an ebl
alias for this to save more keystrokes:
alias ebl='vim $(eb --last-log | grep -v ">>")'
The grep -v
part is only there because I always have $EB_VERBOSE
set.
The Results ...
line should actually include the path to the application build log, see https://github.com/easybuilders/easybuild-framework/blob/0c1a642848681a71709c2dffefdc305a86cb789b/easybuild/framework/easyblock.py#L4186-L4187
For some reason it's not for you though? So the glob
that defines the logs
variable comes back with an empty list somehow?
Can you check the temporary directory created by EasyBuild, are the logs actually still there?
Also, can you share the output of eb --show-config
?
for me the log files shown at the beginning and at the end are not the same, which is a bit confusing. at the beginning:
== Temporary log file in case of crash /dev/shm/vsc10009/eb-e_w78y/easybuild-6J_zLt.log
at the end (after a crash):
== Results of the build can be found in the log file(s) /dev/shm/vsc10009/eb-e_w78y/easybuild-GROMACS-2021.4-20211222.085224.wZvkK.log
@smoors That's because there are two types of log files: a "session" log file, and a separate log file per installation being performed (which gets copied to the installation directory on completion); see also https://docs.easybuild.io/en/latest/Logfiles.html
The
Results ...
line should actually include the path to the application build log, seeFor some reason it's not for you though? So the
glob
that defines thelogs
variable comes back with an empty list somehow?Can you check the temporary directory created by EasyBuild, are the logs actually still there?
Also, can you share the output of
eb --show-config
?
@surak Can you clarify?
I think that the issue description is clear enough, no? There's no pointer at the actual error, neither where to find it.
What I have been doing on the supercomputers for years is to run everything with -l
- as most errors tend to show close to the end, with a bit of a scroll at least.
On other environments, -l
seems to print much more than before, or so was my impression. When updating the timer, does it do some ascii commands on the timer line or does it print everything again?
In any case, if something fails, the last thing I want to see on the screen is either:
Currently, I have neither.
== installing extension rgl 0.107.14 (386/1038)...
>> defining build environment for gcccoremkl/11.2.0-2021.4.0 toolchain
>> running command:
[started at: 2022-01-24 22:18:12]
[working dir: /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl]
[output logged in /tmp/eb-uw4uv1em/easybuild-run_cmd-ytni5pln.log]
tar xzf /p/project/ccstao/cstao05/easybuild/juwelsbooster/sources/r/R/extensions/rgl_0.107.14.tar.gz
>> command completed: exit 0, ran in < 1s
>> running command:
[started at: 2022-01-24 22:18:12]
[working dir: /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl]
[output logged in /tmp/eb-uw4uv1em/easybuild-run_cmd-eh_rwk__.log]
R CMD INSTALL /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl
--library=/p/project/ccstao/cstao05/easybuild/juwelsbooster/software/R/4.1.2-gcccoremkl-11.2.0-2021.4.0/lib64/R/library --no-clean-on-error
>> command completed: exit 1, ran in 00h01m25s
== ... (took 1 min 26 secs)
== ... (took 3 hours 8 mins 51 secs)
== FAILED: Installation ended unsuccessfully (build directory: /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0): build failed (first 300
chars): cmd " R CMD INSTALL /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl
--library=/p/project/ccstao/cstao05/easybuild/juwelsbooster/software/R/4.1.2-gcccoremkl-11.2.0-2021.4.0/lib64/R/library --no-clean-on-error " exited with exit
code 1 and output:
* installing *source* pack (took 3 hours 36 mins 9 secs)
== Results of the build can be found in the log file(s) /tmp/eb-uw4uv1em/easybuild-R-4.1.2-20220124.184330.qfPqj.log
Installing 'rgl' extension (386/1038) ━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━ 3:08:15
ERROR: Build of /p/project/cjsc/a.strube/easybuild-repository/Golden_Repo/r/R/R-4.1.2-gcccoremkl-11.2.0-2021.4.0.eb failed (err: 'build failed (first 300 chars): cmd " R CMD INSTALL /dev/shm/strube1/juwelsbooster/R/4.1.2/gcccoremkl-11.2.0-2021.4.0/rgl/rgl --library=/p/project/ccstao/cstao05/easybuild/juwelsbooster/software/R/4.1.2-gcccoremkl-11.2.0-2021.4.0/lib64/R/library --no-clean-on-error " exited with exit code 1 and output:\n* installing *source* pack')
It's like... what happened? I don't know. And the error log is not the last thing I see: Rather, it's completely useless info. I have to fish for the log file on the letter soup here.
The worst part of Easybuild, when it fails, is tracking down where things went wrong.
While rebuilding a LOONG toolchain, one ends up with something like this:
No idea. The output itself is useless.
No idea - it was written at the beginning, but it's not written here at the end. The buffer scrolled so much that there's no location of the logfile here anymore. Time to fish
/tmp
again... Plus, the file at the beginning gets deleted, although the directory is kept.Suggestions:
== Results of the build can be found in the log file(s)
.== Results of the build can be found in the log file at XXXXX