cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

gfal library issue in CMSSW_10_2_X #26462

Open aperloff opened 5 years ago

aperloff commented 5 years ago

I am unable to use the gfal commands after I've done a cmsenv for CMSSW_10_2_X (I'm using CMSSW_10_2_11_patch1). See for example:

cd CMSSW_10_2_11_patch1/src
cmsenv
gfal-ls gsiftp://cmseos-gridftp.fnal.gov//eos/uscms/store/user/lpcsusyhad/

which yields:

Traceback (most recent call last):
  File "/usr/bin/gfal-ls", line 24, in <module>
    from gfal2_util.shell import Gfal2Shell
  File "/usr/lib/python2.7/site-packages/gfal2_util/shell.py", line 23, in <module>
    from base import CommandBase
  File "/usr/lib/python2.7/site-packages/gfal2_util/base.py", line 23, in <module>
    import argparse
  File "/usr/lib/python2.7/argparse.py", line 85, in <module>
    import collections as _collections
  File "/usr/lib/python2.7/collections.py", line 8, in <module>
    from _collections import deque, defaultdict
ImportError: /usr/lib/python2.7/lib-dynload/_collectionsmodule.so: wrong ELF class: ELFCLASS32

This seems to only be a problem after I've done the cmsenv. Prior to doing that I am able to use all of the gfal commands. Is something in CMSSW clobbering the python settings needed to use the gfal commands? I realize that gfal is not an external tool in CMSSW, but it would be nice not to have to keep a non-CMSSW terminal open just to do file operations.

cmsbuild commented 5 years ago

A new Issue was created by @aperloff Alexx Perloff.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

davidlange6 commented 5 years ago

are you by chance not consistently using slc6 on slc6 or slc7 on slc7 ?

On Apr 15, 2019, at 9:30 PM, Alexx Perloff notifications@github.com wrote:

I am unable to use the gfal commands after I've done a cmsenv for CMSSW_10_2_X (I'm using CMSSW_10_2_11_patch1). See for example:

cd CMSSW_10_2_11_patch1/src cmsenv gfal-ls gsiftp://cmseos-gridftp.fnal.gov//eos/uscms/store/user/lpcsusyhad/

which yields:

Traceback (most recent call last): File "/usr/bin/gfal-ls", line 24, in

from gfal2_util.shell import Gfal2Shell

File "/usr/lib/python2.7/site-packages/gfal2_util/shell.py", line 23, in

from base import CommandBase

File "/usr/lib/python2.7/site-packages/gfal2_util/base.py", line 23, in

import argparse

File "/usr/lib/python2.7/argparse.py", line 85, in

import collections as _collections

File "/usr/lib/python2.7/collections.py", line 8, in

from _collections import deque, defaultdict

ImportError: /usr/lib/python2.7/lib-dynload/_collectionsmodule.so: wrong ELF class: ELFCLASS32

This seems to only be a problem after I've done the cmsenv. Prior to doing that I am able to use all of the gfal commands. Is something in CMSSW clobbering the python settings needed to use the gfal commands? I realize that gfal is not an external tool in CMSSW, but it would be nice not to have to keep a non-CMSSW terminal open just to do file operations.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

aperloff commented 5 years ago

Hi David, I'm using an slc7 CMSSW release on slc7:

>>>uname -a
Linux cmslpc41.fnal.gov 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018 x86_64 x86_64 x86_64 GNU/Linux

>>>echo $SCRAM_ARCH
slc7_amd64_gcc700

I was also asked to do an ldd of the collectionsmodule.so, so that is below as well.


ldd /usr/lib/python2.7/lib-dynload/_collectionsmodule.so
    linux-gate.so.1 =>  (0xf774c000)
    libpthread.so.0 => /lib/libpthread.so.0 (0xf7700000)
    libc.so.6 => /lib/libc.so.6 (0xf7535000)
    /lib/ld-linux.so.2 (0xf774d000)
davidlange6 commented 5 years ago

any suggestions for reproducing at cern? (i'm presumably just missing some setup...)

[dlange@lxplus721 ~]$ cd CMSSW_10_2_11_patch1 [dlange@lxplus721 ~/CMSSW_10_2_11_patch1]$ cmsenv [dlange@lxplus721 ~/CMSSW_10_2_11_patch1]$ gfal-ls gsiftp://cmseos-gridftp.fnal.gov//eos/uscms/store/user/lpcsusyhad/ Could not find platform independent libraries Could not find platform dependent libraries Consider setting $PYTHONHOME to [:] ImportError: No module named site

On Apr 15, 2019, at 9:36 PM, Alexx Perloff notifications@github.com wrote:

Hi David, I'm using an slc7 CMSSW release on slc7:

uname -a Linux cmslpc41.fnal.gov 3.10.0-957.1.3.el7.x86_64

1 SMP Mon Nov 26 12:36:06 CST 2018 x86_64 x86_64 x86_64 GNU/Linux

echo $SCRAM_ARCH

slc7_amd64_gcc700

I was also asked to do an ldd of the collectionsmodule.so, so that is below as well.

ldd /usr/lib/python2.7/lib-dynload/_collectionsmodule.so linux-gate.so.1 =

(0xf774c000) libpthread.so.0 =

/lib/libpthread.so.0 (0xf7700000) libc.so.6 =

/lib/libc.so.6 (0xf7535000) /lib/ld-linux.so.2 (0xf774d000)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aperloff commented 5 years ago

Hi David, I also get the same thing as you when I try this on lxplus. I don't know why the messages are different. I generally don't work on lxplus, so I know less about that site configuration. I can tell you that I get the same message in tcsh and in bash, which is not much of a surprise.

>>> ssh -Y aperloff@lxplus7.cern.ch
>>> gfal-ls gsiftp://cmseos-gridftp.fnal.gov//eos/uscms/store/user/lpcsusyhad/
<works fine>
cd CMSSW_10_1_7/src/
cmsenv
gfal-ls gsiftp://cmseos-gridftp.fnal.gov//eos/uscms/store/user/lpcsusyhad/
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
ImportError: No module named site
davidlange6 commented 5 years ago

Hi Alexx

@belforte who likely knows more

I think the difference between our cern and fnal results is the way the system python is configured (possibly)

gflal-ls (at least at cern) uses the system python (hardwired..) - if your system has a 32-bit python (what does print sys.maxsize, sys.maxsize > 2**32 return?) then you would get the error you had at FNAL.

At CERN, I can make things work by replacing /usr/bin/python with /usr/bin/env python in gfal-ls and then making sure its libraries are in my path (PYTHONPATH /usr/lib/python2.7/site-packages/:/usr/lib64/python2.7/site-packages).

So it is indeed a python environment interference, presumably present since the start of python in CMSSW..

now as to fix without changing gflal-ls.. I don't have a good idea

On Apr 15, 2019, at 9:59 PM, Alexx Perloff notifications@github.com wrote:

Hi David, I also get the same thing as you when I try this on lxplus. I don't know why the messages are different. I generally don't work on lxplus, so I know less about that site configuration. I can tell you that I get the same message in tcsh and in bash, which is not much of a surprise.

ssh -Y aperloff@lxplus7.cern.ch gfal-ls gsiftp://cmseos-gridftp.fnal.gov//eos/uscms/store/user/lpcsusyhad/

cd CMSSW_10_1_7/src/ cmsenv gfal-ls gsiftp://cmseos-gridftp.fnal.gov//eos/uscms/store/user/lpcsusyhad/ Could not find platform independent libraries Could not find platform dependent libraries Consider setting $PYTHONHOME to [: ] ImportError: No module named site — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
belforte commented 5 years ago

ensuring long-term compatibility among those things may be a bitch CRAB client works around this and similar issues by forking a process which does scram unsetenv; gfal-whatever

is that an option here ? (sorry, haven't read the full thread).

aperloff commented 5 years ago

Hi David,

I believe FNAL has a 64-bit python.

>>> which python
/usr/bin/python
>>> python
>>> import sys
>>> print sys.maxsize, sys.maxsize > 2**32
9223372036854775807 True
>>> import struct
>>> print( 8 * struct.calcsize("P"))
64

This matches what I get when I set up CMSSW:

cd CMSSW_10_2_11_patch1/src/
cmsenv
>>> which python
/cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw-patch/CMSSW_10_2_11_patch1/external/slc7_amd64_gcc700/bin/python
>>> python
>>> import sys
>>> print sys.maxsize, sys.maxsize > 2**32
9223372036854775807 True
>>> import struct
>>> print( 8 * struct.calcsize("P"))
64
aperloff commented 5 years ago

Hi David and Stefano,

I checked into gfal2's source code. It looks like they are compiling against whichever version of python is setup during installation (https://gitlab.cern.ch/dmc/gfal2/blob/develop/cmake/modules/FindPythonEasy.cmake). This then links to some .so (/usr/lib/python2.7/lib-dynload/_collectionsmodule.so)...somehow.

Is there any reason why we don't just add gfal2 to the list of CMSSW externals? That would mitigate this problem by having the CMSSW python version setup when gfal2 is installed, right? @belforte, wouldn't that also solve the CRAB client problem?

belforte commented 5 years ago

Well... I gave up many years ago on trying to have CMS and Grid software co-exhist, and my life got better ! It is one of those things where at some time things just work, at times they can be made to work with little effort, and at times the two organizations need to a non-compatible change of course on different time scales. We had various horrible hacks, until scram unsetenv came to the rescue and it has been marvelous since. That said, grid software is surely changing very little in recent years but we also have less and less manpower. I would be very careful with promising our users that they can do cmsenv and get a working grid UI on every machine, if you run gfal at a computer in Finland and got an obscure SSL error, where do you turn for help ? Problem may simply be in /etc/grid... at the local node. There's value in being able to tell users: make sure it works before you define CMS env Seems to me that your question is rather: can I have one machine (or all of LPC interactive nodes) configured with a special gfal-* build which works with CMSSW ?

aperloff commented 5 years ago

Hi Stefano, I'd really rather not have a site or node specific solution. If that's what it's going to take then I'm just going to have to tough it up and keep a terminal open just to use grid software and/or rely more heavily on LCG software. This is really frustrating and disappointing.

davidlange6 commented 5 years ago

On Apr 16, 2019, at 9:00 PM, Alexx Perloff notifications@github.com wrote:

Hi Stefano, I'd really rather not have a site or node specific solution. If that's what it's going to take then I'm just going to have to tough it up and keep a terminal open just to use grid software and/or rely more heavily on LCG software. This is really frustrating and disappointing.

presumably distro using a different python (eg, LCG) is going to have the same issue as CMSSW conflicting with gfal unless they distribute it.. (at least at don't see why not). Do you see something different?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aperloff commented 5 years ago

LCG includes gfal: Versions with gfal: http://lcginfo.cern.ch/pkg/gfal/ Versions with gfal2: http://lcginfo.cern.ch/pkg/gfal2/

aperloff commented 5 years ago

Update: It looks like LCG is also going to drop the gfal tools (https://sft.its.cern.ch/jira/browse/SPI-1295).

mschnepf commented 4 years ago

Hi, I played a bit with the python variables and I think I have a solution.

  1. source the current SLC7 grid environment source /cvmfs/grid.cern.ch/umd-c7ui-latest/etc/profile.d/setup-c7-ui-example.sh for the current gfal python packages
  2. source CMSSW environment
  3. set the PYTHONHOME variable to the python path from CMSSW e.g. export PYTHONHOME=/cvmfs/cms.cern.ch/slc7_amd64_gcc820/external/python/2.7.15/

After that gfal-... should work again. I testet it on machines at out institute and on LXPLUS with CMSSW_10_6_2 and CMSSW_10_3_1

davidlange6 commented 4 years ago

but the last command will mean that cmssw aspects won't work...

On Oct 7, 2019, at 2:46 PM, mschnepf notifications@github.com wrote:

Hi, I played a bit with the python variables and I think I have a solution.

• source the current SLC7 grid environment source /cvmfs/grid.cern.ch/umd-c7ui-latest/etc/profile.d/setup-c7-ui-example.sh for the current gfal python packages • source CMSSW environment • set the PYTHONHOME variable to the python path from CMSSW e.g. export PYTHONHOME=/cvmfs/cms.cern.ch/slc7_amd64_gcc820/external/python/2.7.15/ After that gfal-... should work again. I testet it on machines at out institute and on LXPLUS with CMSSW_10_6_2 and CMSSW_10_3_1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kpedro88 commented 4 years ago

to avoid exporting PYTHONHOME, you can alias the commands like:

alias gfal-ls='PYTHONHOME=/cvmfs/cms.cern.ch/slc7_amd64_gcc820/external/python/2.7.15/ gfal-ls'
belforte commented 4 years ago

I'd rather stick to the safer: (eval `scram unsetenv -sh`; gfal-ls)