Open wjnicol opened 2 years ago
Hi,
I do not have an exact solution to core dumped problem. But could you make the versions match what was shown on the tensorflow website? https://www.tensorflow.org/install/source#gpu
I recommend you to try some python virtual environment, such as anaconda.
Hello,
What exactly do you mean by creating a python virtual environment? Similar to how EMAN2 is installed?
I will investigate versions but I do not find a combination that fits my specs.
So I installed the most recent tensorflow instead, 2.6.0 and I have progress in the sense that I get a bunch of error messages:
11-05 14:34:20, INFO
11-05 14:34:21, ERROR Traceback (most recent call last):
File "/home/wjnicol/Repo/IsoNet/bin/refine.py", line 25, in run
run_whole(args)
File "/home/wjnicol/Repo/IsoNet/bin/refine.py", line 106, in run_whole
from IsoNet.training.predict import predict
File "/home/wjnicol/Repo/IsoNet/training/predict.py", line 4, in
I think I did not install tensorflow properly. I followed the instructions you provided: pip install tensorflow-gpu==2.6.0 but when I read how to install tensorflow from the page you provide to check compatibility it involves many more steps. Should I do a proper installation of tensorflow or only the command you provide is enough?
Thanks,
Hi,
I am sorry that you have to deal with these problems. We do encountered a lot of problems when versions do not match what are shown on website.
What you can do is to either: Download packages from https://developer.nvidia.com/cuda-toolkit https://developer.nvidia.com/cudnn and install.
Or use download anaconda: https://www.anaconda.com/
Here are commands for my recent installation: conda create --name tf2.5 conda activate tf2.5 conda install python=3.6 conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1 pip install tensorflow==2.5 pip install fire mrcfile tqdm scipy scikit-image export HDF5_USE_FILE_LOCKING=FALSE export PATH=/home/lytao/software/IsoNet/bin:$PATH export PYTHONPATH=/home/lytao/software:$PYTHONPATH
Hope that would help.
Ok I am trying this right now. After this i should launch isonet.py gui from the tf2.5 environment?
Ok so this works (i didn't do the last two exports to the path because I had already done that prior. I did however need to do pip install PyQt5 after your commands. From there isonet.py gui works fine and refining works ! However it seems to be using all 16 cores at 100% and it just suddenly crashes my computer which then reboots. By crashing I mean sudden black screen and then it boots. It really weird. Tried it twice.
![Uploading Screenshot from 2021-11-03 12-16-24.png…]()
Additional information: I am trying this on 3 bin4 tomograms, ~1k each...
Thank you for your reporting this, there is a parameter that specify how many cpu you are going to use in preprocessing step.
I suggest you start with tutorial dataset to observe the behavior of the program.
Even when i use 8 threads with the sample data or my data it does the same thing. The computer turns off My CPU has 8 double threaded cores. Am I asking for too much even when I say 8 cpus? I will try with one. Do you know of a log file in linux that reports various crashes and hardware issues. I'm wondering if your software is just too demanding for my computer.
I think It's making my system crash
wjnicol@caliban:~$ last -x | head | tac wjnicol :1 :1 Fri Nov 5 15:55 - crash (00:07) reboot system boot 5.11.0-37-generi Fri Nov 5 16:02 still running runlevel (to lvl 5) 5.11.0-37-generi Fri Nov 5 16:03 - 16:25 (00:21) wjnicol :1 :1 Fri Nov 5 16:03 - crash (00:21) reboot system boot 5.11.0-37-generi Fri Nov 5 16:24 still running runlevel (to lvl 5) 5.11.0-37-generi Fri Nov 5 16:25 - 16:46 (00:21) wjnicol :1 :1 Fri Nov 5 16:25 - crash (00:20) reboot system boot 5.11.0-37-generi Fri Nov 5 16:46 still running runlevel (to lvl 5) 5.11.0-37-generi Fri Nov 5 16:46 still running wjnicol :1 :1 Fri Nov 5 16:46 still logged in
Sorry for bombarding you with messages buti will be away from my workstation for 2 weeks and am trying to give you as much info as possible.
From this page, https://unix.stackexchange.com/questions/9819/how-to-find-out-from-the-logs-what-caused-system-shutdown , I found a way to get logs on why my comp shutsdown:
wjnicol@caliban:~$ grep -iv ': starting|kernel: .*: Power Button|watching system buttons|Stopped Cleaning Up|Started Crash recovery kernel' \
/var/log/messages /var/log/syslog /var/log/apcupsd \ | grep -iw 'recover[a-z]|power[a-z]|shut[a-z ]down|rsyslogd|ups' grep: /var/log/messages: No such file or directory /var/log/syslog:Nov 5 15:54:51 caliban apparmor.systemd[1012]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd /var/log/syslog:Nov 5 15:54:51 caliban systemd[1]: Finished Update UTMP about System Boot/Shutdown. /var/log/syslog:Nov 5 15:54:51 caliban systemd[1]: Finished Restore /etc/resolv.conf if the system crashed before the ppp link was shut down. /var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd. [v8.2001.0] /var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: rsyslogd's groupid changed to 110 /var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: rsyslogd's userid changed to 104 /var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1063" x-info="https://www.rsyslog.com"] start /var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 0.585685] pci 0000:05:00.1: D0 power state depends on 0000:05:00.0 /var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 8.840032] EXT4-fs (nvme0n1): recovery complete /var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 10.038314] EXT4-fs (sdc): recovery complete /var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 11.837374] EXT4-fs (sdb1): recovery complete /var/log/syslog:Nov 5 15:54:51 caliban dbus-daemon[1043]: dbus[1043]: Unknown group "power" in message bus configuration file /var/log/syslog:Nov 5 15:54:51 caliban thermald[1075]: Need Linux PowerCap sysfs /var/log/syslog:Nov 5 15:54:51 caliban NetworkManager[1044]:
[1636152891.6834] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-dns-resolved.conf, 20-connectivity-ubuntu.conf, no-mac-addr-change.conf) (run: 10-globally-managed-devices.conf) (etc: default-wifi-powersave-on.conf) /var/log/syslog:Nov 5 15:54:51 caliban systemd[1]: Started Unattended Upgrades Shutdown. /var/log/syslog:Nov 5 15:54:55 caliban systemd[1]: Started Daemon for power management. /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) config/udev: Adding input device Power Button (/dev/input/event1) /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: () Power Button: Applying InputClass "libinput keyboard catchall" /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) Using input driver 'libinput' for 'Power Button' /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: () Power Button: always reports core events /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: is tagged by udev as: Keyboard /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device removed /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6) /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: is tagged by udev as: Keyboard /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) config/udev: Adding input device Power Button (/dev/input/event0) /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: () Power Button: Applying InputClass "libinput keyboard catchall" /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) Using input driver 'libinput' for 'Power Button' /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: () Power Button: always reports core events /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: is tagged by udev as: Keyboard /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device removed /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 7) /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: is tagged by udev as: Keyboard /var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:55:07 caliban kernel: [ 27.621060] systemd-journald[411]: File /var/log/journal/6af7e9060f66425b8aafcb55c60d336b/user-2011.journal corrupted or uncleanly shut down, renaming and replacing. /var/log/syslog:Nov 5 15:55:07 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device removed /var/log/syslog:Nov 5 15:55:07 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device removed /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) config/udev: Adding input device Power Button (/dev/input/event1) /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: () Power Button: Applying InputClass "libinput keyboard catchall" /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) Using input driver 'libinput' for 'Power Button' /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: () Power Button: always reports core events grep: /var/log/apcupsd*/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: is tagged by udev as: Keyboard : No such file or directory /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: device removed /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6) /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: is tagged by udev as: Keyboard /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) config/udev: Adding input device Power Button (/dev/input/event0) /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: () Power Button: Applying InputClass "libinput keyboard catchall" /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) Using input driver 'libinput' for 'Power Button' /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: () Power Button: always reports core events /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: is tagged by udev as: Keyboard /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: device removed /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 7) /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: is tagged by udev as: Keyboard /var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: device is a keyboard /var/log/syslog:Nov 5 15:55:08 caliban systemd[1759]: gnome-session-pre.target: Requested dependency OnFailure=gnome-session-shutdown.target ignored (target units cannot fail). /var/log/syslog:Nov 5 15:55:08 caliban systemd[1759]: gnome-session-initialized.target: Requested dependency OnFailure=gnome-session-shutdown.target ignored (target units cannot fail). /var/log/syslog:Nov 5 15:55:08 caliban systemd[1759]: gnome-session-failed.target: Requested dependency OnFailure=gnome-session-shutdown.target ignored (target units cannot fail). /var/log/syslog:Nov 5 15:55:10 caliban systemd[1759]: Started GNOME Power management handling. /var/log/syslog:Nov 5 15:55:10 caliban systemd[1759]: Reached target GNOME Power management handling.
At least for tutorial dataset, we often use 20 cpus and 4 gpus 1080Ti. No such error/crash was observed. I think you can test with a much smaller dataset, e.g. 20 subtomos.
I do not know how to interpret those logs. I will inform you when I get some idea.
If you can, please let me know your commands to run IsoNet. If you are using GUI, please click print command.
Hello,
I installed IsoNet no problem and can run all the preparation steps fine either with GUI or command line.
When I try to start the refining step through the GUI nothing happens. When I try through the command line I get an "Illegal Instruction (core dumped)" error (picture attached) . By googling the error it seems to be a cpu issue.
NVIDIA GeForce GTX 1080 running with NVIDIA drivers 470.63.01 Intel Xeon CPU E5-2687W 3.10GhZ x 16 Ubuntu 20.04 Python 3.8.10 cuDNN v8.2.4 for cuda 11.4 GCC 9.3.0 Cuda 11.4 tensorflow 2.4.0
Thank you for your help,
Best,
William J Nicolas