aclex / pytorch-ebuild

Ebuild infrastructure files for PyTorch and some related projects
GNU General Public License v2.0
5 stars 5 forks source link

Error during compile phase of torchvision #12

Closed Jimmy2027 closed 3 years ago

Jimmy2027 commented 3 years ago

Hi, thanks a lot for the ebuild of torchvision! When trying to emerge, I am getting this error during the compile phase:

/var/tmp/portage/sci-libs/torchvision-0.8.1/temp/environment: line 2310:   332 Segmentation fault      "${@}"
 * ERROR: sci-libs/torchvision-0.8.1::aclex-pytorch failed (compile phase):
 *   (no error message)
 * 
 * Call stack:
 *     ebuild.sh, line  125:  Called src_compile
 *   environment, line 3936:  Called distutils-r1_src_compile
 *   environment, line 1839:  Called _distutils-r1_run_foreach_impl 'distutils-r1_python_compile'
 *   environment, line  652:  Called python_foreach_impl 'distutils-r1_run_phase' 'distutils-r1_python_compile'
 *   environment, line 3481:  Called multibuild_foreach_variant '_python_multibuild_wrapper' 'distutils-r1_run_phase' 'distutils-r1_python_compile'
 *   environment, line 2947:  Called _multibuild_run '_python_multibuild_wrapper' 'distutils-r1_run_phase' 'distutils-r1_python_compile'
 *   environment, line 2945:  Called _python_multibuild_wrapper 'distutils-r1_run_phase' 'distutils-r1_python_compile'
 *   environment, line 1069:  Called distutils-r1_run_phase 'distutils-r1_python_compile'
 *   environment, line 1830:  Called distutils-r1_python_compile
 *   environment, line 1699:  Called esetup.py 'build' '-j' '1'
 *   environment, line 2317:  Called die
 * The specific snippet of code:
 *       "${@}" || die "${die_args[@]}";

The full log can be found here: build.log

aclex commented 3 years ago

Thank you for the report! Quite a strange crash indeed. Could you please mention the PyTorch version you have on your setup and the default Python version you use? The only inconsistency I see is that the torchvision project captures Python-3.9, according to the configuration log, while it's being built against Python-3.7 and set up by Python-3.8. As it has C/C++ backend underneath there's small probability of such a crash due to some incompatible symbols. This sounds like a bug in the ebuild here, but meanwhile you can workaround it with using Python-3.7 as the default Python interpreter while installing this ebuild.

Jimmy2027 commented 3 years ago

Thank you for your answer! I am using pytorch 1.7.0 from this overlay:

[I] sci-libs/pytorch
     Available versions:  (~)1.3.0(0/1.3.0)*l[1] (~)1.3.1(0/1.3.1)*l[1] (~)1.4.0(0/1.4.0)*l[1] (~)1.4.0_p0-r1[2] (~)1.5.0(0/1.5.0)*l[1] (~)1.5.1(0/1.5.1)*l[1] (~)1.6.0(0/1.6.0)*l[1] (~)1.6.0-r1[2] (~)1.6.0-r2[3] (~)1.7.0(0/1.7.0)*l[1] {asan atlas cuda doc eigen +fbgemm ffmpeg gflags glog +gloo leveldb lmdb mkl (+)mkldnn mpi namedtensor +nnpack numa +numpy +observers (+)openblas opencl opencv +openmp +python +qnnpack redis rocm static tbb test tools zeromq PYTHON_TARGETS="python3_6 python3_7 python3_8"}
     Installed versions:  1.7.0(0/1.7.0)*l[1](11:48:47 PM 12/13/2020)(fbgemm gloo mkldnn nnpack numpy observers openmp python qnnpack -asan -atlas -cuda -doc -eigen -ffmpeg -gflags -glog -leveldb -lmdb -mkl -mpi -namedtensor -numa -openblas -opencl -opencv -redis -rocm -static -tbb -test -tools -zeromq PYTHON_TARGETS="python3_7 python3_8 -python3_6")
[1] "aclex-pytorch" /var/db/repos/aclex-pytorch

And I already have Python-3.7 as default Python interpreter:

eselect python show
python3.7

I have also tried to set sci-libs/torchvision python_targets_python3_7 in package.use but still got the same error.

Jimmy2027 commented 3 years ago

If that helps, here is my emerge --info:

Portage 3.0.12 (python 3.7.9-final-0, default/linux/amd64/17.1/desktop, gcc-10.2.0, glibc-2.32-r5, 5.9.9-gentoo x86_64)
=================================================================
System uname: Linux-5.9.9-gentoo-x86_64-Intel-R-_Core-TM-_i7-4765T_CPU_@_2.00GHz-with-gentoo-2.7
KiB Mem:    16274256 total,   7495316 free
KiB Swap:    2047996 total,   1797644 free
Timestamp of repository gentoo: Sun, 13 Dec 2020 21:17:05 +0000
Head commit of repository gentoo: 91d1b695e29cfd720f307995b36c9cb60e73ce50
Head commit of repository aclex-pytorch: 77081a53b0049faad984e2b36cd04b0b47f91eac

Head commit of repository science: f1b4f4489e2e9181bca1dde19fffb24a9155bef7

sh bash 5.0_p18
ld GNU ld (Gentoo 2.35.1 p2) 2.35.1
distcc 3.3.3 x86_64-pc-linux-gnu [disabled]
app-shells/bash:          5.0_p18::gentoo
dev-java/java-config:     2.3.1::gentoo
dev-lang/perl:            5.30.3-r1::gentoo
dev-lang/python:          3.6.12::gentoo, 3.7.9::gentoo, 3.8.6::gentoo, 3.9.0::gentoo
dev-util/cmake:           3.19.1::gentoo
sys-apps/baselayout:      2.7-r1::gentoo
sys-apps/openrc:          0.42.1::gentoo
sys-apps/sandbox:         2.20::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r5::gentoo
sys-devel/automake:       1.16.3-r1::gentoo
sys-devel/binutils:       2.35.1-r1::gentoo
sys-devel/gcc:            10.2.0-r3::gentoo
sys-devel/gcc-config:     2.3.2-r1::gentoo
sys-devel/libtool:        2.4.6-r6::gentoo
sys-devel/make:           4.3::gentoo
sys-kernel/linux-headers: 5.9::gentoo (virtual/os-headers)
sys-libs/glibc:           2.32-r5::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    sync-rsync-verify-jobs: 1
    sync-rsync-extra-opts: 
    sync-rsync-verify-max-age: 24
    sync-rsync-verify-metamanifest: yes

aclex-pytorch
    location: /var/db/repos/aclex-pytorch
    sync-type: git
    sync-uri: https://github.com/aclex/pytorch-ebuild
    masters: gentoo

science
    location: /var/db/repos/science
    sync-type: git
    sync-uri: git://git.gentoo.org/proj/sci.git
    masters: gentoo

Jimmy
    location: /var/db/repos/Jimmy
    masters: gentoo
    priority: 8888

chymeric
    location: /home/hendrik/src/overlay
    masters: gentoo
    priority: 8889

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O3 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.2/ext-active/ /etc/php/apache2-php7.3/ext-active/ /etc/php/apache2-php7.4/ext-active/ /etc/php/apache2-php8.0/ext-active/ /etc/php/cgi-php7.2/ext-active/ /etc/php/cgi-php7.3/ext-active/ /etc/php/cgi-php7.4/ext-active/ /etc/php/cgi-php8.0/ext-active/ /etc/php/cli-php7.2/ext-active/ /etc/php/cli-php7.3/ext-active/ /etc/php/cli-php7.4/ext-active/ /etc/php/cli-php8.0/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O3 -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-march=native -O3 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O3 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j9"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE=">=dev-python/numpy-1.17.4-r2 X a52 aac acl acpi alsa amd64 berkdb branding bzip2 cairo cdda cdr cli crypt cups dbus dri dts dvd dvdr elogind emboss encode exif ffmpeg flac fortran fuse gdbm gif gpm gtk gtk3 gui iconv icu ipv6 jpeg lapack lcms libglvnd libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses nls nptl ogg opengl openmp opentime pam pango pcre pdf png policykit ppds qt5 readline sdl seccomp spell split-usr ssl startup-notification svg tcpd tiff truetype udev udisks unicode upower usb vorbis wxwidgets x264 xattr xcb xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gnat_2019" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2 php7-3 php7-4" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_7" PYTHON_TARGETS="python3_7 python3_8" RUBY_TARGETS="ruby25 ruby26" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
aclex commented 3 years ago

Many thanks for the detailed information, @Jimmy2027! Don't have any promising strategy on this yet, but will try to reproduce it on my machine.

aclex commented 3 years ago

I have tried to reproduce your setup and the crash on the clean Docker image tonight, but it compiles successfully. Of course, I've checked only the major things, namely, toolchain (without Glibc which can't be easily compiled inside a container on my setup) and build flags. Another difference it that I used Python-3.8 as a default interpreter — noticed too late it's been a default in the PYTHON_TARGETS, so decided to check with it. Will try to check it with Python-3.7 also.

Here's emerge --info output:

Portage 3.0.9 (python 3.8.6-final-0, default/linux/amd64/17.1, gcc-10.2.0, glibc-2.32-r2, 5.4.80-gentoo x86_64)
=================================================================
System uname: Linux-5.4.80-gentoo-x86_64-Intel-R-_Core-TM-_i7-6700K_CPU_@_4.00GHz-with-glibc2.2.5
KiB Mem:    32822616 total,  18868508 free
KiB Swap:   10249432 total,  10249432 free
Head commit of repository gentoo: 5a09f6382d65f06f0bb3b0491f61f789465d65e0
Head commit of repository aclex-pytorch: 77081a53b0049faad984e2b36cd04b0b47f91eac

sh bash 5.0_p18
ld GNU ld (Gentoo 2.34 p6) 2.34.0
app-shells/bash:          5.0_p18::gentoo
dev-lang/perl:            5.30.3::gentoo
dev-lang/python:          3.7.9::gentoo, 3.8.6::gentoo, 3.9.0::gentoo
dev-util/cmake:           3.17.4-r1::gentoo
sys-apps/baselayout:      2.7::gentoo
sys-apps/openrc:          0.42.1::gentoo
sys-apps/sandbox:         2.20::gentoo
sys-devel/autoconf:       2.69-r5::gentoo
sys-devel/automake:       1.16.2-r1::gentoo
sys-devel/binutils:       2.34-r2::gentoo, 2.35.1-r1::gentoo
sys-devel/gcc:            9.3.0-r2::gentoo, 10.2.0-r3::gentoo
sys-devel/gcc-config:     2.3.2-r1::gentoo
sys-devel/libtool:        2.4.6-r6::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 5.4-r1::gentoo (virtual/os-headers)
sys-libs/glibc:           2.32-r2::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    sync-rsync-verify-max-age: 24
    sync-rsync-verify-metamanifest: yes
    sync-rsync-extra-opts: 
    sync-rsync-verify-jobs: 1

aclex-pytorch
    location: /var/db/repos/aclex-pytorch
    sync-type: git
    sync-uri: https://github.com/aclex/pytorch-ebuild
    masters: gentoo

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="C.UTF8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="acl amd64 berkdb bzip2 cli crypt dri fortran gdbm iconv ipv6 libglvnd libtirpc multilib ncurses nls nptl openmp pam pcre readline seccomp split-usr ssl tcpd unicode xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2 php7-3 php7-4" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_8" PYTHON_TARGETS="python2_7 python3_8" RUBY_TARGETS="ruby25 ruby26" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LINGUAS, MAKEOPTS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

The default Python is the following:

eselect python show
python3.8

More aggressive optimization flags should be also checked obviously.

aclex commented 3 years ago

I've just recompiled @world, including both PyTorch and Torchvision, with Python-3.7 in PYTHON_TARGETS and as a default interpteter, with the same compilation flags as in your output applied, but it still builds successfully for me. See no way to reproduce it, unfortunately. Given you have some testing packages in the toolchain, I can only imagine there's some subtle breakage there and suggest to try building it with stable versions of sys-devel/gcc, sys-libs/glibc and sys-devel/binutils.

Jimmy2027 commented 3 years ago

Thanks a lot for your efforts, @aclex ! I have recompiled @world with optimization -O2 instead of -O3 and also added -ggdb to my CFLAGS. Still torchvision wouldn't build.

I also could not find anything using the GNU debugger:

gdb --args python3.8 setup.py build -j 1    
                                                                                                                                                       GNU gdb (Gentoo 10.1 vanilla) 10.1                                                                                                                                                                                                                                            
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.                                                                                   
This GDB was configured as "x86_64-pc-linux-gnu".                                                                                                                                                                                                                             
Type "show configuration" for configuration details.                                                                                   
For bug reporting instructions, please see:                       
<https://bugs.gentoo.org/>.                                                                                                            
Find the GDB manual and other documentation resources online at:                                                                                                                                                                                                              
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".                                                                                                                 
Type "apropos word" to search for commands related to "word"...   
Reading symbols from python3.8...                    
(gdb) run                                           
Starting program: /usr/bin/python3.8 setup.py build -j 1
[Thread debugging using libthread_db enabled]       
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 11404]
[Detaching after fork from child process 11409]                 
[Detaching after fork from child process 11410]      
[Detaching after fork from child process 11411]
fatal: not a git repository (or any of the parent directories): .git
Building wheel torchvision-0.8.0a0                             
PNG found: True                  
[Detaching after fork from child process 11412]
libpng version: 1.6.37                                                                                                                 
Building torchvision with PNG image support  
[Detaching after fork from child process 11413]            
[Detaching after fork from child process 11414]
libpng include path: /usr/include/libpng16     
Running build on conda-build: False            
Running build on conda: False                  
JPEG found: True                                                                                                                       
Building torchvision with JPEG image support
FFmpeg found: False               
running build                                                                                                                          
running build_py                      
copying torchvision/version.py -> build/lib.linux-x86_64-3.8/torchvision
running build_ext                               
[Detaching after fork from child process 11415]                                                                                                                                                                                                                               
[New Thread 0x7fffddb57640 (LWP 11416)]                                                                                                                                                                                                                                       
building 'torchvision._C' extension                                                                                                                                                                                                                                           
[Detaching after fork from child process 11417]                                                                                                                                                                                                                               
Emitting ninja build file /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/build.ninja...                                                                                                                                       
Compiling objects...                                                                                                                                                                                                                                                          
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)                      
[Detaching after fork from child process 11418]
ninja: no work to do.                         
x86_64-pc-linux-gnu-g++ -pthread -shared /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/vision.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/ROIAlign_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/nms_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/PSROIAlign_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/DeformConv_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/PSROIPool_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/ROIPool_cpu.o -L/usr/lib/python3.8/site-packages/torch/lib -L/usr/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/torchvision/_C.so                                                                 
[Detaching after fork from child process 11419]                                                                                                                                                                                                                               
building 'torchvision.image' extension                                                                                                                                                                                                                                        
[Detaching after fork from child process 11422]                                                                                                                                                                                                                               
Emitting ninja build file /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/build.ninja...                                                                                                                                       
Compiling objects...                                                                                                                                                                                                                                                          
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)                                                                                                                                                             
[Detaching after fork from child process 11423]
ninja: no work to do.                    
x86_64-pc-linux-gnu-g++ -pthread -shared /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/image.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/readjpeg_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/read_write_file_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/jpegcommon.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/writejpeg_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/readpng_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/writepng_cpu.o /var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/build/temp.linux-x86_64-3.8/var/tmp/portage/sci-libs/torchvision-0.8.1/work/torchvision-0.8.1/torchvision/csrc/cpu/image/read_image_cpu.o -L/usr/lib64 -L/usr/lib/python3.8/site-packages/torch/lib -L/usr/lib64 -lpng -ljpeg -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/torchvision/image.so                                                  
[Detaching after fork from child process 11424]  
[Thread 0x7fffddb57640 (LWP 11416) exited]                                                                                             
[Inferior 1 (process 11400) exited normally]

I was told that pytorch and torchvision do not work well with recent kernel versions, do you think this could be an issue?

aclex commented 3 years ago

Indeed, it succeeds inside GDB. Very strange bug. You might consider enabling core files, run the compilation normally and then do a post-mortem debugging on python3.8. I think, it's unlikely affected by kernel version, given it's not actually any PyTorch or Torchvision running, but a quite subtle segfault during compilation. I would be so surprised to know it might be incompatible with recent kernels that way :)

Jimmy2027 commented 3 years ago

Hi @aclex, sorry for the long break. Could you tell me how I can "enable the core files, run the compilation normally and then do a post-mortem debugging on python3.8" ?

aclex commented 3 years ago

No problem) Try something like this, then run the crashing installation and finally take a look at /tmp directory. You can then run crash debugging with gdb python3.8 <path-to-core-file> to see e.g. the backtrace.

Jimmy2027 commented 3 years ago

Hi @aclex sorry again for the long delay. I was able to install torchvision in the end on this machine by setting USE="-python_targets_python3_7" so I will close this issue. Thanks for your help :)

I did run into an issue however on another machine when using the cuda use flag, I opened a pull request with a simple fix.

aclex commented 3 years ago

@Jimmy2027 thanks for the feedback on the result of that and surely for the pull request! Glad it now works for you. Your fix concerning CUDA is also merged now.