dmwm / CRABServer

15 stars 38 forks source link

make it possible to use python3 for DAGMAN's in scheds #6803

Closed belforte closed 3 years ago

belforte commented 3 years ago

initial steps:

belforte commented 3 years ago

after the above:

+ python3 AdjustSites.py
Traceback (most recent call last):
  File "AdjustSites.py", line 25, in <module>
    from RESTInteractions import CRABRest
  File "/data/srv/glidecondor/condor_local/spool/5950/0/cluster8925950.proc0.subproc0/RESTInteractions.py", line 19, in <module>
    import pycurl
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (nss)

but

[cms1627@vocms059 cluster8925950.proc0.subproc0]$ python3
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycurl
>>> 

pff....

belforte commented 3 years ago

trying to use py3-pycurl form yum install brings the old problem that curllib does not support SSL authentication, and bringing in curl from CMSSW I end up with the other old problem of

ImportError: pycurl: libcurl link-time version (7.59.0) is older than compile-time version (7.70.0)

if I use py3-pycurl (and hence python3) from cmssw and/or the other old foe

ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (nss)

if I use python3 and py3-pycurl from the OS.

As much as I liked to run the wrapper in the OS environment, I start to think if it wouldn't be better to use COMP env. from CVMFS to have consistent builds of condor/curl/python3 and various externals (future e.g.)

Sort of "looks we had been very lucky that until now this was enough": https://github.com/dmwm/CRABServer/blob/d950ef1919c0ed5372481dcd7bcbff7e3b7f60cd/scripts/dag_bootstrap_startup.sh#L17-L20

amaltaro commented 3 years ago

As much as I liked to run the wrapper in the OS environment, I start to think if it wouldn't be better to use COMP env. from CVMFS to have consistent builds of condor/curl/python3 and various externals (future e.g.)

+1 for using a setup that is well tested and well under control (even though WMCore is adventuring itself in a different environment for non-x86 architecture...).

belforte commented 3 years ago

thanks @amaltaro for giving me courage here ! hmm... WMCore's COMP env has only these externals

belforte@lxplus776/TC3> ls -d /cvmfs/cms.cern.ch//COMP/slc7_amd64_gcc630/external/py3-*
/cvmfs/cms.cern.ch//COMP/slc7_amd64_gcc630/external/py3-future/
/cvmfs/cms.cern.ch//COMP/slc7_amd64_gcc630/external/py3-setuptools/
belforte@lxplus776/TC3> 

does not have py3-pycurl maybe becasue it is not needed for job wrapper ?

belforte commented 3 years ago

I can likely find a working environemnt with setting up e.g. CMSSW_12_x, but of course there is no htcondor there... Well.. COMP env. does not have htcondor either. The beauty of using the OS environment was that yum install htcondor brings all we need in that respect with a well understood and well supported HTC condor distribution from HTC developers.

belforte commented 3 years ago

differently from CRABClient, python scripts which we run in the schedd make a lot of calls to CRABServer REST, so I think it will be better to preserve using pycurl_manager and hence pycurl instead of going for the easier "fork curl" path.

amaltaro commented 3 years ago

does not have py3-pycurl maybe becasue it is not needed for job wrapper ?

That's correct! I think we could install py3-pycurl though in CVMFS as well. However, I just noticed we do not have this package available in our slc6_amd64_gcc493 COMP architecture.

belforte commented 3 years ago

need htcondor as well.. .sort of .. the full WMA env. and I do not care which architecture of course.

amaltaro commented 3 years ago

If this software stack doesn't change too often, perhaps we could have a chat with Shahzad to see what we could do. If it's somehow risky, we could even consider creating a new area in CVMFS (CRAB instead of COMP, or COMP_CRAB, or whatever) such that we can isolate these changes from the production jobs.

belforte commented 3 years ago

all of this becasue apparently py3 has become more picky than python2 and does not like that we swap backend exaclty to bring in openssl authentication in addition to the uselss (for us) nss

ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (nss)

I was hoping not to need to bug you and Shazad on this, we nee such simple pythons here :-(

if we need a sofisticated env,.. why not what WMA uses ? all in all it runs on "same schedds" as CRAB stuff does, talk s to same Rucio . and likely to CMSWEB as well. It is a pity that there's so much effort for you to integrate a new htcondor version, but if you have to do it anyhow... we may very well be in same boat.

belforte commented 3 years ago

I could not find a python3 version of FTS client. So have asked support: https://cern.service-now.com/service-portal?id=ticket&table=incident&n=INC2945940

this keeps looking more trouble than gain. Maybe best solution for the pycurl problem is use token to authenticate with cmsweb. All in all we are moving fast to using token for TW/schedd comunication anyhow.

belforte commented 3 years ago

thanks all for your help on this topic. Time to wrap debate and go to work.

Here's a quick summary:

Using tokens for real is quite a change to current operational practices and I (at least) have clearly a lot to learn there, topic for another time, but it looks good to get started on that.

I.e. will investigate more the "token" things and hopefully work on "use tokens" first,. At that point move to python3 will hopefully be simple since all things I depend on will come via yum install.

belforte commented 3 years ago

example for my convenience on my laptop as root:

curl repo.data.kit.edu/key.pgp | apt-key add -
vi /etc/apt/sources.list  and add these two lines
# for OIDC (https://indigo-dc.gitbook.io/oidc-agent/installation/install)
deb https://repo.data.kit.edu/ubuntu/focal ./

# then
apt-get update
apt-get install oidc-agent

on my laptop as belforte:

eval $(oidc-agent)
oidc-gen -w device wlcg
# at the prompt type this as issuer:  https://cms-auth.web.cern.ch/
# and ask max (no quotes around it) as scopes
# log on cms-auth.web.cern.ch, authorize etc.
# make to put a non-null password or some things will not work
export TOK=`oidc-token wlcg`

port that $TOK env. var. to lxplus (copy/paste e.g. !) and there

belforte@lxplus8s07/~> /usr/bin/curl  -H "Authorization: Bearer $TOK" https://cmsweb-auth.cern.ch/crabserver/preprod/info
{"desc": {"columns": ["null"]}, "result": [
 {"crabserver": "Welcome", "version": "v3.210701"}
]}
belforte@lxplus8s07/~> 
belforte commented 3 years ago

status: waiting to hear from some expert how are we supposed to manage tokens in a production service. How to name/get/renew/register/transition from one person to another.... all those things which we do now with service certificates.

On sched machines superuser (root) can use 'condor_token_create' and 'condor_token_fetch' and apparenlty use those for submission, but I have no idea how to make those tokens will work with CMSWEB.

belforte commented 3 years ago

@yuyiguo suggested that the solution used for DBS Client on pip can also work for me. Documentation is in https://github.com/dmwm/DBSClient/wiki I will try

belforte commented 3 years ago

I made that work on my SL7 VM where I already had pip. And of course I want to do this as root and make it available to every user, in spite of pip telling me all of the times

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour
with the system package manager. It is recommended to use a virtual environment instead:
https://pip.pypa.io/warnings/venv

Had to change a few things though, since instructions in the twiki lead to build a pycurl for NSS not OpenSSL and that gives me

pycurl.error: (35, 'Peer does not recognize and trust the CA that issued your certificate.')

so I built pycurl with openssl instead by using

export PYCURL_SSL_LIBRARY=openssl
pip3.6 install --compile --install-option="--with-openssl" --no-cache-dir pycurl==7.43.0.6

(somehow I had to use pip3, not pip, which when updated renamed itself pip3.6 . oh well) and also did yum install libcurl-devel python3-devel gcc and pointed libcurl.so to "Shahzad's" version which has openssl support via https://github.com/dmwm/CRABServer/blob/d950ef1919c0ed5372481dcd7bcbff7e3b7f60cd/scripts/dag_bootstrap_startup.sh#L17-L20

with that I can run DBS Client. Will look into the best way to do this on all schedd's... I am not sure I can 'puppettize' it.

As a start.. reset the VM and do again cleanly and with documentation.

belforte commented 3 years ago

things were smoother on CC8 (tried that to have a clean playground, do not want to destroy my SL7 VM):

yum install -y openssl-devel
yum install -y gcc
yum install -y curl-devel
yum install -y python3-devel
pip3 install --compile --install-option="--with-openssl" --no-cache-dir pycurl
pip install certifi
pip install dbs3-client

after that (and fixing /etc/vomses/ directory) on a "clean' CC8 VM

belforte@stefanocc8/~> gp
Your identity: /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Stefano Belforte belforte@infn.it
Creating temporary proxy ......................... Done
Contacting  voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms" Done
Creating proxy ................................. Done

Your proxy is valid until Fri Oct 29 12:09:02 2021
belforte@stefanocc8/~> python3
Python 3.6.8 (default, Sep 21 2021, 20:17:36) 
[GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dbs.apis.dbsClient import *
>>> 
>>> url="https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader/"
>>> # API Object
... dbs3api = DbsApi(url=url)
>>> print(dbs3api.listDataTiers(data_tier_name='AOD'))
[{'create_by': '/DC=org/DC=doegrids/OU=People/CN=Yuyi Guo 899208', 'creation_date': 1176750789, 'data_tier_id': 9, 'data_tier_name': 'AOD'}]
>>> 
belforte@stefanocc8/~> 

interestingly here I did not need to point libcurl.so to CMSSW's one !

yuyiguo commented 3 years ago

@belforte

I was using my VM that is CentOS-7. I talked with Shahazad and he pointed out "As system curl is build with nss so you have to build pycurl with nss backend too.". So I used system curl instead of "Shahzad's" version . It is puzzled to me that how openssl worked in your case?

We saw the error as you had pycurl.error: (35, 'Peer does not recognize and trust the CA that issued your certificate.')

I think the fixing on the error is to "pip install certifi".

belforte commented 3 years ago

Thanks Yuyi, I clearly need to test more on SL7 and cleanup things. It is a bit comforting that on CC8 things "may" be simpler, since it "may" be that we can (or have to) move schedd's to CC8 (or CentoStream8...) before 2026. I can surely try building pycurl with multiple backends (nss + openssl) simply I am not confident how to make sure that I cleanup things "at every try" to make sure I end up with a reproducible recipe.

But lack of "pip install certifi" gives a clear import error, the "Peer does not recognize..." usually comes from wrong libcurl.so

belforte commented 3 years ago

only now I understood that I do not need to build pycurl on every node, just do it once, and bring it around. So in the end things are as simple as

build procedure, as root on a brand new SL7 VM:

 yum install python3 gcc curl-devel python3-devel openssl-devel
 pip3 install certifi
 export PYCURL_SSL_LIBRARY=openssl
 pip3 install --compile --install-option="--with-openssl" --no-cache-dir pycurl
belforte commented 3 years ago

I have put this new pycurl in https://cms-docs.web.cern.ch/CRAB/pycurl3/7.44.1/pycurl.cpython-36m-x86_64-linux-gnu.so it can be downloaded anywhere with wget hopefully will make it easy to use puppet to put it on all schedd's, exact location is irrelevant, since it is only a matter of adding it to $PYTHONPATH

belforte commented 3 years ago

probably time to rename this as "make it possible to use python3 on schedd", close and have different issues for the actual code changes which I need, things like import commands, remove calls to old MonAlisa dashboard, urlenconde, HTTPException... all things which I have found already in CRABClient.

belforte commented 3 years ago

let's close. Acutal code work is tracked in #6813