Closed qliphy closed 4 years ago
@qliphy as I mentioned by email, you should be able to use login-el7.uscms.org
, which is CMS Connect with Centos 7/SLC7. There are some submission issues with this machine at present though, I will have that fixed this coming week.
@qliphy @efeyazgan @agrohsje @kdlong : Submission from login-el7.uscms.org
should be working now. Could you try submitting some gridpacks?
Thanks a lot @khurtado
@afanfani Would you please have a test for Z+01234jetstoMuMu? Thanks!
Thanks @khurtado However I don't have permission access to login-el7.uscms.org ssh afanfani@login-el7.uscms.org Permission denied (publickey,gssapi-keyex,gssapi-with-mic,hostbased). I checked that I correctly have ssh key on my GlobusID afanfani@globusid.org .
@afanfani : If your ssh keys are working with login.uscms.org, could you send me the output of
ssh -vv afanfani@login.uscms.org
and
ssh -vv afanfani@login-el7.uscms.org
so that I can compare?
I don't know what happened however I can now access login-el7 and I've submitted a gridpack production. Let's see how it goes. Btw the output of ssh -vv for login-el7 is in [1] and for login is in [2].
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for
debug2: resolving "login-el7.uscms.org" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to login-el7.uscms.org [192.170.231.13] port 22.
debug1: Connection established.
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1
debug1: match: OpenSSH_6.6.1 pat OpenSSH_6.6.1 compat 0x04000000
debug2: fd 4 setting O_NONBLOCK
debug1: Authenticating to login-el7.uscms.org:22 as 'afanfani'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
ebug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1,ext-info-c
debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-dss
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: host key algorithms: ssh-rsa,ecdsa-sha2-nistp256,ssh-ed25519
debug2: ciphers ctos: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: ciphers stoc: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: MACs ctos: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: MACs stoc: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: compression ctos: none,zlib@openssh.com
debug2: compression stoc: none,zlib@openssh.com
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug1: kex: algorithm: curve25519-sha256@libssh.org
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC:
Home directory usage for afanfani:
Bytes: [ # ] 0% (0/102400 MB)
Files: [ # ] 0% (28/500000 files)
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 58: Applying options for debug2: resolving "login.uscms.org" port 22 debug2: ssh_connect_direct: needpriv 0 debug1: Connecting to login.uscms.org [192.170.227.118] port 22. debug1: Connection established. debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa type 1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519 type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_7.4 debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3 debug1: match: OpenSSH_5.3 pat OpenSSH_5 compat 0x0c000000 debug2: fd 4 setting O_NONBLOCK debug1: Authenticating to login.uscms.org:22 as 'afanfani' debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug2: local client KEXINIT proposal debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1,ext-info-c debug2: host key algorithms: ssh-rsa-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256,ssh-rsa,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,ssh-dss debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: compression ctos: none,zlib@openssh.com,zlib debug2: compression stoc: none,zlib@openssh.com,zlib debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug2: peer server KEXINIT proposal debug2: KEX algorithms: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: host key algorithms: ssh-rsa,ssh-dss debug2: ciphers ctos: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: ciphers stoc: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: MACs ctos: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: MACs stoc: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: compression ctos: none,zlib@openssh.com debug2: compression stoc: none,zlib@openssh.com debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug1: kex: algorithm: diffie-hellman-group-exchange-sha256 debug1: kex: host key algorithm: ssh-rsa debug1: kex: server->client cipher: aes128-ctr MAC: umac-64@openssh.com compression: none debug1: kex: client->server cipher: aes128-ctr MAC: umac-64@openssh.com compression: none debug1: kex: diffie-hellman-group-exchange-sha256 need=16 dh_need=16 debug1: kex: diffie-hellman-group-exchange-sha256 need=16 dh_need=16 debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<3072<8192) sent debug1: got SSH2_MSG_KEX_DH_GEX_GROUP debug2: bits set: 1549/3072 debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: got SSH2_MSG_KEX_DH_GEX_REPLY debug1: Server host key: ssh-rsa SHA256:79b2xCzNlqjK6Mfb5j0NYEj1c4KTDw7lvPB3U9gLJaM debug1: Host 'login.uscms.org' is known and matches the RSA host key. debug1: Found key in /afs/cern.ch/user/a/afanfani/.ssh/known_hosts:93 debug2: bits set: 1559/3072 debug2: set_newkeys: mode 1 debug1: rekey after 4294967296 blocks debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug2: set_newkeys: mode 0 debug1: rekey after 4294967296 blocks debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_rsa (0x55d8ff222a60) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_dsa ((nil)) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa ((nil)) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519 ((nil)) debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,password,hostbased debug1: Next authentication method: publickey debug1: Offering RSA public key: /afs/cern.ch/user/a/afanfani/.ssh/id_rsa debug2: we sent a publickey packet, wait for reply debug1: Server accepts key: pkalg ssh-rsa blen 277 debug2: input_userauth_pk_ok: fp SHA256:GJoDW9XRAXgfwvAFXV0Q2XuM9OUxb+0UTuQbnU5eoNg debug1: Authentication succeeded (publickey). Authenticated to login.uscms.org ([192.170.227.118]:22). debug2: fd 6 setting O_NONBLOCK debug1: channel 0: new [client-session] debug2: channel 0: send open debug1: Requesting no-more-sessions@openssh.com debug1: Entering interactive session. debug1: pledge: network debug2: callback start debug2: fd 4 setting TCP_NODELAY debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 debug1: Sending environment. debug1: Sending env XMODIFIERS = @im=none debug2: channel 0: request env confirm 0 debug1: Sending env LANG = en_US.UTF-8 debug2: channel 0: request env confirm 0 debug2: channel 0: request shell confirm 1 debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug2: channel_input_status_confirm: type 99 id 0 debug2: PTY allocation request accepted on channel 0 debug2: channel 0: rcvd adjust 2097152 debug2: channel_input_status_confirm: type 99 id 0 debug2: shell request accepted on channel 0
The gridpack is failing because it seems that patches to MadGraph are not applied in the CODEGEN step, see logCODEGEN.txt . While login-el7 is CentOs7 the condor jobs still land on SL6 nodes (see the log), although this might not be the cause of the failure in applying patches.
@afanfani Could you comment the following line and try again?:
or change it to:
+REQUIRED_OS = "rhel7"
By default, if this line is not specified, "rhel6" is used in login.uscms.org
and "rhel7" is used in login-el7.uscms.org
thanks! Running on COs7 the patches are correctly applied in the CODEGEN step and the needed model is found, however the INTEGRATE step still fails importing the model . It complains about scram_arch being slc7_amd64_gcc493 instead of slc7_amd64_gcc700 . Now I'm retrying explicitelly providing scram_arch and cmssw to the submit_cmsconnect_gridpack_generation.sh .
Next issue is with compilation of Source Directory. Full log attached however the relevant lines seems: from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/lhapdf/6.2.1-pafccj3/include/LHAPDF/PDF.h:10, from pdf_lhapdf6.cc:6: /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/include/c++/7.4.1/x86_64-unknown-linux-gnu/bits/os_defines.h:39:10: fatal error: features.h: No such file or directory
So it seems features.h is missing. I don't have much clues on how to address it.
@afanfani Ummhh, looks like dependencies that should have been handled by cvmfs but didn't. The slc6 version seems similar, so maybe glibc-devel/headers
to match what is available in login.uscms.org
. Could you give it another try?
Thanks. The error related to
[1] ….. INFO: ClusterId 1334 was held with code 13, subcode 2. Releasing it. INFO: ClusterId 1524 was held with code 13, subcode 2. Releasing it. INFO: All jobs finished INFO: Idle: 0, Running: 0, Completed: 819 [ 24m 43s ] ESC[1;31mError when reading /local-scratch/afanfani/TestMcM/GenBranch_mg261_UL/genproductions/bin/MadGraph5_aMCatNLO/DYJetsToMuMu_LO_MLM_mll50/DYJetsToMuMu_LO_MLM_mll50_gridpack/work/processtmp/SubProcesses/P4_gg_llggqq/G1/results.datESC[0m
@afanfani THe error messages in condor are like this:
-- Schedd: login-el7.uscms.org : <192.170.231.13:9618?... @ 06/03/19 15:59:53
ID OWNER HELD_SINCE HOLD_REASON
1983.0 afanfani 6/2 10:09 Error from slot1_5@glidein_3300_25883925@r4b1b.grid.hephy.at: STARTER at 10.200.200.50 failed to send file(s) to <192.170.231.13:9618>: error reading from /home/cmspil14/home_cream_169198914/CREAM169198914/glide_V2EYQ8/execute/dir_6666/G107: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <193.170.243.209:46224>
Looking into the local working directory:
[khurtado@login-el7 P4_gq_llgqqq]$ condor_q 1983.0 -af Iwd
/local-scratch/afanfani/TestMcM/GenBranch_mg261_UL_hessian/genproductions/bin/MadGraph5_aMCatNLO/DYJetsToMuMu_LO_MLM_mll50/DYJetsToMuMu_LO_MLM_mll50_gridpack/work/processtmp/SubProcesses/P4_gq_llgqqq
G107 and G* in general are files that seem empty. I remember these were directories in the past, something seems odd with the job creation/splitting in madgraph
@khurtado @efeyazgan @agrohsje @kdlong
This is related to https://hypernews.cern.ch/HyperNews/CMS/get/generators/4340.html
"login.uscms.org" seems always to be slc6, this will bring trouble for UL as currently some necessary PDFs are only available with 10_6_0 which is build with slc7