cms-sw / genproductions

Generator fragments for MC production
https://twiki.cern.ch/twiki/bin/view/CMS/GitRepositoryForGenProduction
79 stars 786 forks source link

cmsconnect for slc7 #2238

Closed qliphy closed 4 years ago

qliphy commented 5 years ago

@khurtado @efeyazgan @agrohsje @kdlong

This is related to https://hypernews.cern.ch/HyperNews/CMS/get/generators/4340.html

"login.uscms.org" seems always to be slc6, this will bring trouble for UL as currently some necessary PDFs are only available with 10_6_0 which is build with slc7

khurtado commented 5 years ago

@qliphy as I mentioned by email, you should be able to use login-el7.uscms.org, which is CMS Connect with Centos 7/SLC7. There are some submission issues with this machine at present though, I will have that fixed this coming week.

khurtado commented 5 years ago

@qliphy @efeyazgan @agrohsje @kdlong : Submission from login-el7.uscms.org should be working now. Could you try submitting some gridpacks?

qliphy commented 5 years ago

Thanks a lot @khurtado

@afanfani Would you please have a test for Z+01234jetstoMuMu? Thanks!

afanfani commented 5 years ago

Thanks @khurtado However I don't have permission access to login-el7.uscms.org ssh afanfani@login-el7.uscms.org Permission denied (publickey,gssapi-keyex,gssapi-with-mic,hostbased). I checked that I correctly have ssh key on my GlobusID afanfani@globusid.org .

khurtado commented 5 years ago

@afanfani : If your ssh keys are working with login.uscms.org, could you send me the output of

ssh -vv afanfani@login.uscms.org

and

ssh -vv afanfani@login-el7.uscms.org

so that I can compare?

afanfani commented 5 years ago

I don't know what happened however I can now access login-el7 and I've submitted a gridpack production. Let's see how it goes. Btw the output of ssh -vv for login-el7 is in [1] and for login is in [2].

[1]

--------------> login-el7.uscms.org

OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 58: Applying options for debug2: resolving "login-el7.uscms.org" port 22 debug2: ssh_connect_direct: needpriv 0 debug1: Connecting to login-el7.uscms.org [192.170.231.13] port 22. debug1: Connection established. debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa type 1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519 type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_7.4 debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1 debug1: match: OpenSSH_6.6.1 pat OpenSSH_6.6.1 compat 0x04000000 debug2: fd 4 setting O_NONBLOCK debug1: Authenticating to login-el7.uscms.org:22 as 'afanfani' debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug2: local client KEXINIT proposal ebug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1,ext-info-c debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-dss debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: compression ctos: none,zlib@openssh.com,zlib debug2: compression stoc: none,zlib@openssh.com,zlib debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug2: peer server KEXINIT proposal debug2: KEX algorithms: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: host key algorithms: ssh-rsa,ecdsa-sha2-nistp256,ssh-ed25519 debug2: ciphers ctos: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: ciphers stoc: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: MACs ctos: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: MACs stoc: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: compression ctos: none,zlib@openssh.com debug2: compression stoc: none,zlib@openssh.com debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug1: kex: algorithm: curve25519-sha256@libssh.org debug1: kex: host key algorithm: ecdsa-sha2-nistp256 debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: compression: none debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: compression: none debug1: kex: curve25519-sha256@libssh.org need=64 dh_need=64 debug1: kex: curve25519-sha256@libssh.org need=64 dh_need=64 debug1: expecting SSH2_MSG_KEX_ECDH_REPLY debug1: Server host key: ecdsa-sha2-nistp256 SHA256:0UIjhW07PR7qBd1PqtzNZcvTK/Ca0FEYP7vxAp8zd6k Warning: Permanently added 'login-el7.uscms.org,192.170.231.13' (ECDSA) to the list of known hosts. debug2: set_newkeys: mode 1 debug1: rekey after 134217728 blocks debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug2: set_newkeys: mode 0 debug1: rekey after 134217728 blocks debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_rsa (0x55de82016430) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_dsa ((nil)) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa ((nil)) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519 ((nil)) debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,hostbased debug1: Next authentication method: gssapi-keyex debug1: No valid Key exchange context debug2: we did not send a packet, disable method debug1: Next authentication method: gssapi-with-mic debug1: Unspecified GSS failure. Minor code may provide more information Server not found in Kerberos database debug2: we sent a gssapi-with-mic packet, wait for reply debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,hostbased debug2: we did not send a packet, disable method debug1: Next authentication method: publickey debug1: Offering RSA public key: /afs/cern.ch/user/a/afanfani/.ssh/id_rsa debug2: we sent a publickey packet, wait for reply debug1: Server accepts key: pkalg ssh-rsa blen 277 debug2: input_userauth_pk_ok: fp SHA256:GJoDW9XRAXgfwvAFXV0Q2XuM9OUxb+0UTuQbnU5eoNg debug1: Authentication succeeded (publickey). Authenticated to login-el7.uscms.org ([192.170.231.13]:22). debug2: fd 6 setting O_NONBLOCK debug1: channel 0: new [client-session] debug2: channel 0: send open debug1: Requesting no-more-sessions@openssh.com debug1: Entering interactive session. debug1: pledge: network debug2: callback start debug2: fd 4 setting TCP_NODELAY debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 debug1: Sending environment. debug1: Sending env XMODIFIERS = @im=none debug2: channel 0: request env confirm 0 debug1: Sending env LANG = en_US.UTF-8 debug2: channel 0: request env confirm 0 debug2: channel 0: request shell confirm 1 debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug2: channel_input_status_confirm: type 99 id 0 debug2: PTY allocation request accepted on channel 0 ebug2: channel 0: rcvd adjust 2097152 debug2: channel_input_status_confirm: type 99 id 0 debug2: shell request accepted on channel 0 Last login: Thu May 30 13:02:13 2019^M

Home directory usage for afanfani: Bytes: [ # ] 0% (0/102400 MB)
Files: [ # ] 0% (28/500000 files)

[2]

--------------> login.uscms.org

OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 58: Applying options for debug2: resolving "login.uscms.org" port 22 debug2: ssh_connect_direct: needpriv 0 debug1: Connecting to login.uscms.org [192.170.227.118] port 22. debug1: Connection established. debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa type 1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_rsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_dsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519 type -1 debug1: key_load_public: No such file or directory debug1: identity file /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_7.4 debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3 debug1: match: OpenSSH_5.3 pat OpenSSH_5 compat 0x0c000000 debug2: fd 4 setting O_NONBLOCK debug1: Authenticating to login.uscms.org:22 as 'afanfani' debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug2: local client KEXINIT proposal debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1,ext-info-c debug2: host key algorithms: ssh-rsa-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256,ssh-rsa,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,ssh-dss debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: compression ctos: none,zlib@openssh.com,zlib debug2: compression stoc: none,zlib@openssh.com,zlib debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug2: peer server KEXINIT proposal debug2: KEX algorithms: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: host key algorithms: ssh-rsa,ssh-dss debug2: ciphers ctos: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: ciphers stoc: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: MACs ctos: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: MACs stoc: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: compression ctos: none,zlib@openssh.com debug2: compression stoc: none,zlib@openssh.com debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug1: kex: algorithm: diffie-hellman-group-exchange-sha256 debug1: kex: host key algorithm: ssh-rsa debug1: kex: server->client cipher: aes128-ctr MAC: umac-64@openssh.com compression: none debug1: kex: client->server cipher: aes128-ctr MAC: umac-64@openssh.com compression: none debug1: kex: diffie-hellman-group-exchange-sha256 need=16 dh_need=16 debug1: kex: diffie-hellman-group-exchange-sha256 need=16 dh_need=16 debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<3072<8192) sent debug1: got SSH2_MSG_KEX_DH_GEX_GROUP debug2: bits set: 1549/3072 debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: got SSH2_MSG_KEX_DH_GEX_REPLY debug1: Server host key: ssh-rsa SHA256:79b2xCzNlqjK6Mfb5j0NYEj1c4KTDw7lvPB3U9gLJaM debug1: Host 'login.uscms.org' is known and matches the RSA host key. debug1: Found key in /afs/cern.ch/user/a/afanfani/.ssh/known_hosts:93 debug2: bits set: 1559/3072 debug2: set_newkeys: mode 1 debug1: rekey after 4294967296 blocks debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug2: set_newkeys: mode 0 debug1: rekey after 4294967296 blocks debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_rsa (0x55d8ff222a60) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_dsa ((nil)) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_ecdsa ((nil)) debug2: key: /afs/cern.ch/user/a/afanfani/.ssh/id_ed25519 ((nil)) debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,password,hostbased debug1: Next authentication method: publickey debug1: Offering RSA public key: /afs/cern.ch/user/a/afanfani/.ssh/id_rsa debug2: we sent a publickey packet, wait for reply debug1: Server accepts key: pkalg ssh-rsa blen 277 debug2: input_userauth_pk_ok: fp SHA256:GJoDW9XRAXgfwvAFXV0Q2XuM9OUxb+0UTuQbnU5eoNg debug1: Authentication succeeded (publickey). Authenticated to login.uscms.org ([192.170.227.118]:22). debug2: fd 6 setting O_NONBLOCK debug1: channel 0: new [client-session] debug2: channel 0: send open debug1: Requesting no-more-sessions@openssh.com debug1: Entering interactive session. debug1: pledge: network debug2: callback start debug2: fd 4 setting TCP_NODELAY debug2: client_session2_setup: id 0 debug2: channel 0: request pty-req confirm 1 debug1: Sending environment. debug1: Sending env XMODIFIERS = @im=none debug2: channel 0: request env confirm 0 debug1: Sending env LANG = en_US.UTF-8 debug2: channel 0: request env confirm 0 debug2: channel 0: request shell confirm 1 debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug2: channel_input_status_confirm: type 99 id 0 debug2: PTY allocation request accepted on channel 0 debug2: channel 0: rcvd adjust 2097152 debug2: channel_input_status_confirm: type 99 id 0 debug2: shell request accepted on channel 0

afanfani commented 5 years ago

The gridpack is failing because it seems that patches to MadGraph are not applied in the CODEGEN step, see logCODEGEN.txt . While login-el7 is CentOs7 the condor jobs still land on SL6 nodes (see the log), although this might not be the cause of the failure in applying patches.

khurtado commented 5 years ago

@afanfani Could you comment the following line and try again?:

https://github.com/cms-sw/genproductions/blob/73fd42ba130bc851caef0dd4402c950d51e3ee3d/bin/MadGraph5_aMCatNLO/submit_cmsconnect_gridpack_generation.sh#L23

or change it to:

+REQUIRED_OS = "rhel7"

By default, if this line is not specified, "rhel6" is used in login.uscms.org and "rhel7" is used in login-el7.uscms.org

afanfani commented 5 years ago

thanks! Running on COs7 the patches are correctly applied in the CODEGEN step and the needed model is found, however the INTEGRATE step still fails importing the model . It complains about scram_arch being slc7_amd64_gcc493 instead of slc7_amd64_gcc700 . Now I'm retrying explicitelly providing scram_arch and cmssw to the submit_cmsconnect_gridpack_generation.sh .

afanfani commented 5 years ago

Next issue is with compilation of Source Directory. Full log attached however the relevant lines seems: from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/lhapdf/6.2.1-pafccj3/include/LHAPDF/PDF.h:10, from pdf_lhapdf6.cc:6: /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/gcc/7.0.0-pafccj/include/c++/7.4.1/x86_64-unknown-linux-gnu/bits/os_defines.h:39:10: fatal error: features.h: No such file or directory

So it seems features.h is missing. I don't have much clues on how to address it.

compilationError.txt

khurtado commented 5 years ago

@afanfani Ummhh, looks like dependencies that should have been handled by cvmfs but didn't. The slc6 version seems similar, so maybe is getting loaded from the system in login.uscms.org. I just installed glibc-devel/headers to match what is available in login.uscms.org. Could you give it another try?

afanfani commented 5 years ago

Thanks. The error related to is now gone! It took longer than expected to check because I forgot to set SCRAM_ARCH in the environment and providing SCRAM_ARCH and CMSSW version to submit_cmsconnect_gridpack_generation.sh is not enough. However the gridpack generation still fails [1] with a lot of held jobs (#13: condor_starter or shadow failed to send job). It seems similar behavior to lxplus… so it might be related to Madgraph and my cards. Let me just know if there is further condor debugging option I can enable to further track the issue. In case it helps cards and log are in /afs/cern.ch/user/a/afanfani/public/MadGraph5_aMCatNLO/UL/CMSconnect/fifth_try .

[1] ….. INFO: ClusterId 1334 was held with code 13, subcode 2. Releasing it. INFO: ClusterId 1524 was held with code 13, subcode 2. Releasing it. INFO: All jobs finished INFO: Idle: 0, Running: 0, Completed: 819 [ 24m 43s ] ESC[1;31mError when reading /local-scratch/afanfani/TestMcM/GenBranch_mg261_UL/genproductions/bin/MadGraph5_aMCatNLO/DYJetsToMuMu_LO_MLM_mll50/DYJetsToMuMu_LO_MLM_mll50_gridpack/work/processtmp/SubProcesses/P4_gg_llggqq/G1/results.datESC[0m

khurtado commented 5 years ago

@afanfani THe error messages in condor are like this:

-- Schedd: login-el7.uscms.org : <192.170.231.13:9618?... @ 06/03/19 15:59:53
 ID      OWNER          HELD_SINCE  HOLD_REASON
1983.0   afanfani        6/2  10:09 Error from slot1_5@glidein_3300_25883925@r4b1b.grid.hephy.at: STARTER at 10.200.200.50 failed to send file(s) to <192.170.231.13:9618>: error reading from /home/cmspil14/home_cream_169198914/CREAM169198914/glide_V2EYQ8/execute/dir_6666/G107: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <193.170.243.209:46224>

Looking into the local working directory:

[khurtado@login-el7 P4_gq_llgqqq]$ condor_q 1983.0 -af Iwd /local-scratch/afanfani/TestMcM/GenBranch_mg261_UL_hessian/genproductions/bin/MadGraph5_aMCatNLO/DYJetsToMuMu_LO_MLM_mll50/DYJetsToMuMu_LO_MLM_mll50_gridpack/work/processtmp/SubProcesses/P4_gq_llgqqq G107 and G* in general are files that seem empty. I remember these were directories in the past, something seems odd with the job creation/splitting in madgraph