atanuchaudhury / linux-Problem

0 stars 0 forks source link

Having problrm regarding `running the program in GPU #1

Open atanuchaudhury opened 6 months ago

atanuchaudhury commented 6 months ago

Tried to run the config file but givving out error in my system. But the same process is running seamlessly in other System. I have installed Clang As well. The problem is given below:

fetchbook@DESKTOP-AA6KOUT:/mnt/c/Users/YD/Documents/olb-1.6r0$ make clean make CXX=nvcc CC=nvcc -C external clean make[1]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external' make -C zlib clean make[2]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/zlib' make[2]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/zlib' make -C tinyxml clean make[2]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/tinyxml' make[2]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/tinyxml' rm -f lib/libz.a lib/libtinyxml.a make[1]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external' rm -f src/communication/mpiManager.o src/communication/ompManager.o src/core/olbInit.o src/io/ostreamManager.o rm -f build/lib/libolbcore.a fetchbook@DESKTOP-AA6KOUT:/mnt/c/Users/YD/Documents/olb-1.6r0$ make make CXX=nvcc CC=nvcc -C external make[1]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external' make -C zlib make[2]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/zlib' nvcc -c -o build/adler32.o ./adler32.c nvcc -c -o build/crc32.o ./crc32.c nvcc -c -o build/deflate.o ./deflate.c nvcc -c -o build/infback.o ./infback.c nvcc -c -o build/inffast.o ./inffast.c nvcc -c -o build/inflate.o ./inflate.c nvcc -c -o build/inftrees.o ./inftrees.c nvcc -c -o build/trees.o ./trees.c nvcc -c -o build/zutil.o ./zutil.c nvcc -c -o build/compress.o ./compress.c nvcc -c -o build/uncompr.o ./uncompr.c nvcc -c -o build/gzclose.o ./gzclose.c nvcc -c -o build/gzlib.o ./gzlib.c nvcc -c -o build/gzread.o ./gzread.c nvcc -c -o build/gzwrite.o ./gzwrite.c ar rc build//libz.a ./build/adler32.o ./build/crc32.o ./build/deflate.o ./build/infback.o ./build/inffast.o ./build/inflate.o ./build/inftrees.o ./build/trees.o ./build/zutil.o ./build/compress.o ./build/uncompr.o ./build/gzclose.o ./build/gzlib.o ./build/gzread.o ./build/gzwrite.o make[2]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/zlib' cp zlib/build/libz.a lib/ make -C tinyxml make[2]: Entering directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/tinyxml' nvcc -c tinystr.cpp -o build/tinystr.o nvcc -c tinyxml.cpp -o build/tinyxml.o nvcc -c tinyxmlerror.cpp -o build/tinyxmlerror.o nvcc -c tinyxmlparser.cpp -o build/tinyxmlparser.o ar rc build/libtinyxml.a ./build/tinystr.o ./build/tinyxml.o ./build/tinyxmlerror.o ./build/tinyxmlparser.o make[2]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external/tinyxml' cp tinyxml/build/libtinyxml.a lib/ make[1]: Leaving directory '/mnt/c/Users/YD/Documents/olb-1.6r0/external' nvcc -O3 -std=c++17 --forward-unknown-to-host-compiler -pthread --forward-unknown-to-host-compiler -x cu -O3 -std=c++17 --generate-code=arch=compute_60,code=[compute_60,sm_60] --extended-lambda --expt-relaxed-constexpr -rdc=true -Xcudafe "--diag_suppress=implicit_return_from_non_void_function --display_error_number --diag_suppress=20014 --diag_suppress=20011" -**DPLATFORM_CPU_SISD -DPLATFORM_GPU_CUDA -DDEFAULT_FLOATING_POINT_TYPE=float -fPIC -Isrc/ -c src/communication/mpiManager.cpp -o src/communication/mpiManager.o /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’: 435 | function(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’: 530 | operator=(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’ make: * [Makefile:46: src/communication/mpiManager.o] Error 1 fetchbook@DESKTOP-AA6KOUT:/mnt/c/Users/YD/Documents/olb-1.6r0$

AstroSuro commented 6 months ago

Which code have you tried? Put the code in the 'code' section of github. Also, try to compare the code requirements in your system and the other system. This will help you to figure out the issue.

atanuchaudhury commented 6 months ago

The code is provided in the Git Code Section. The code and system requirements are same as my system and other system. Mainly I am confused about the ‘_ArgTypes’ and error: parameter packs not expanded with ‘...’:

AstroSuro commented 6 months ago

Probably you might need to check the config file and modify it accordingly to overcome the issue.

atanuchaudhury commented 6 months ago

I have done the same process in the other system

AstroSuro commented 6 months ago

Kindly check the compiler version in your system and other system to see if there is any discrepancies.

atanuchaudhury commented 6 months ago

Compared them now. The versions are same but the other system doesnot have openmpi installed.

AstroSuro commented 6 months ago

OpenMpi is not causing the issue.

atanuchaudhury commented 6 months ago

ok

atanuchaudhury commented 6 months ago

can you suggest any other solution

atanuchaudhury commented 6 months ago

I tried some solutions but still getting same issue, such as :

That does not help, at least not in Debian. However, what does help is making the following two changes in /usr/include/c++/*/bits/std_function.h :

Line 433+ (approximate):

template<typename _Functor, typename _Constraints = _Requires<_Callable<_Functor>>> function(_Functor&& __f) //noexcept(_Handler<_Functor>::template _S_nothrow_init<_Functor>()) // CUDA BOTCHES THIS : _Function_base() Line 529+ (approximate):

template _Requires<_Callable<_Functor>, function&> operator=(_Functor&& __f) //noexcept(_Handler<_Functor>::template _S_nothrow_init<_Functor>()) // CUDA BOTCHES THIS { function(std::forward<_Functor>(__f)).swap(this); return this; } Commenting out the indicated part solves the compilation. For some reason, NVCC botches the compilation when that part is present. It preprocesses the C++ code and erroneously changes the template signature in a way that does not and can not compile.

And also tried

install gcc of version 10 helps me: sudo apt install gcc-10 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 sudo update-alternatives --config gcc

choose 10 version while compiling, i also set g++ to 10

I forgot to write what after changing version i removed all files in build directory and begin with start cmake, then make

atanuchaudhury commented 6 months ago

std_function.zip

The problem in the error is showing in the line 435 and 530 as per the error given below

/mpiManager.cpp -o src/communication/mpiManager.o /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’: 435 | function(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’: 530 | operator=(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’

atanuchaudhury commented 6 months ago

It is giving out an error while saving the file

ailed to save 'std_function.h': Unable to write file 'vscode-remote://wsl+ubuntu-22.04/usr/include/c++/11/bits/std_function.h' (NoPermissions (FileSystemError): Error: EACCES: permission denied, open '/usr/include/c++/11/bits/std_function.h')

atanuchaudhury commented 6 months ago

I got the solution from here

https://github.com/NVIDIA/nccl/issues/650

AstroSuro commented 6 months ago

This looks like a read-only file hence you have no permission to edit that. And also in previous comments, it seems you have directly copy pasted the errors from the github page. I need to know what the errors being thrown out when you are executing the code.

AstroSuro commented 6 months ago

I need ssh access to your system. Send the ssh access in the form of username@domain. I need to access your system and run the files to look into the prospective problem.

AstroSuro commented 6 months ago

That does not help, at least not in Debian.

As far as In know, your system is not Debian.

AstroSuro commented 6 months ago

NVCC botches the compilation when that part is present. It preprocesses the C++ code and erroneously changes the template signature in a way that does not and can not compile.

What is the meaning of these lines?

atanuchaudhury commented 6 months ago

ok I will try to creat the ssh

atanuchaudhury commented 6 months ago

I sent a mail making a ssh access in the form of username@domain.

AstroSuro commented 6 months ago

There is no such username@domain mentioned in the email. Kindly understand the requirements. I don't need your ssh key fingerprint. Moreover, you are not supposed to share your ssh key fingerprint with anyone except, if required, you might need to upload that to a remote. I need the ssh address to login to your machine.

atanuchaudhury commented 6 months ago

Are you able to login now?

atanuchaudhury commented 6 months ago

Is there any solution to this error. In the code "std_function.h", the error is showing in line 435 and 530 as parameter packs not expanded with ‘...’

AstroSuro commented 6 months ago

Perform the following actions:

  1. sudo apt install openssh-server
  2. Edit the SSH configuration file (usually /etc/ssh/sshd_config) to set up the server.
  3. Set the following into the sshd_config PermitRootLogin no PasswordAuthentication yes
  4. sudo service ssh start
  5. WSL ip: ip addr and provide me the ip adr
AstroSuro commented 6 months ago

Provide the path of your code that is generating the problem.

atanuchaudhury commented 6 months ago

Path of the code:

/usr/include/c++/11/bits/std_function.h

AstroSuro commented 6 months ago

std_funtion.h is a header file in the C++ library. I am asking for the code which when executed gives this error.

atanuchaudhury commented 6 months ago

I execute any of the code gives the same error. I saw a solution on github which tells to edit the std_funtion.h header file, it says to coment two lines and save it. I am not able to make changes in the header file.

atanuchaudhury commented 6 months ago

The lines

PermitRootLogin no PasswordAuthentication yes are commented in the code so I need to decomment it right?

AstroSuro commented 6 months ago

are commented in the code so I need to decomment it right?

Yes.

AstroSuro commented 6 months ago

I execute any of the code gives the same error. I saw a solution on github which tells to edit the std_funtion.h header file, it says to coment two lines and save it. I am not able to make changes in the header file.

You are generally not supposed to make any changes in the header files.

atanuchaudhury commented 6 months ago

These is where I found it https://github.com/NVIDIA/nccl/issues/650

and some people are commenting that worked for them

AstroSuro commented 6 months ago

and some people are commenting that worked for them

Doesn't mean that it will work for you. Those are for the advanced users.

atanuchaudhury commented 6 months ago

I tried to change them PermitRootLogin no PasswordAuthentication yes But not able to do by Notepad. Cant save them. What to do??

AstroSuro commented 6 months ago

But not able to do by Notepad. Cant save them. What to do??

Any text editor should be fine.

atanuchaudhury commented 6 months ago

Not working. Cannot save the file. They just create another file but donot save the original one

AstroSuro commented 6 months ago

Not working. Cannot save the file. They just create another file but donot save the original one

Probably learning how to save such a file can be helpful.

atanuchaudhury commented 6 months ago

trying

AstroSuro commented 6 months ago

What is the password for fetchbook@140.114.121.51?

atanuchaudhury commented 6 months ago

sent to mail

atanuchaudhury commented 6 months ago

Done the following

sudo apt install openssh-server Edit the SSH configuration file (usually /etc/ssh/sshd_config) to set up the server. Set the following into the sshd_config PermitRootLogin no PasswordAuthentication yes sudo service ssh start WSL ip: ip addr and provide me the ip adr

got this 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:15:5d:e2:2f:86 brd ff:ff:ff:ff:ff:ff inet 172.25.57.219/20 brd 172.25.63.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::215:5dff:fee2:2f86/64 scope link valid_lft forever preferred_lft forever

AstroSuro commented 6 months ago

Since you have a windows system and a wsl set up at the same time , WSL being the primary working environment for you, I am getting permission denied when I try to see enter the wsl.

atanuchaudhury commented 6 months ago

Its better to use AnyDesk. You will get access to everything.

AstroSuro commented 6 months ago

Maybe

atanuchaudhury commented 6 months ago

Yes definitely. I saw a video where it states that Linux to Linux is most viable but with other systems we face problems

atanuchaudhury commented 6 months ago

You donot have to download Anydesk, you can use it online from browser also.

atanuchaudhury commented 5 months ago

nvcc

Giving an error. But the file exists in the directory.

atanuchaudhury commented 5 months ago

any solution to this

AstroSuro commented 5 months ago

Please remove the alphabet "I" before C and rerun

atanuchaudhury commented 5 months ago

giving error----- nvcc fatal : Unknown option '-C:/Users/YD/Documents/olb-1.6r0/src'

AstroSuro commented 5 months ago

Remove "-" before the path.