dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
45 stars 106 forks source link

Need authentication when accessing the UserFileCache on cmsweb #3479

Closed DMWMBot closed 11 years ago

DMWMBot commented 12 years ago

Currently the {{{UnpackUserTarball.py}}} script just seems to do a bare {{{urllib.retrieve}}} which does not authenticate in case the https protocol is used.

See setHttpProxy() in UnpackUserTarball. It uses frontier to detect caching proxies that may be in place. The correct environment setting for secure URLs is https_proxy, but I'm not sure that will work if you are authenticating with the a certificate.

DMWMBot commented 12 years ago

mmascher: I plan to work on this taking the proxy from the {{{$X509_USER_PROXY}}} variable.

ericvaandering commented 12 years ago

ewv: Yeah, the initial idea was just to allow this stuff over http, not https.

Do we put pycurl on the WN? That would be the obvious solution.

As usual, you want to make sure that whatever solution you choose doesn't hold the whole file in memory while downloading it.

Finally, don't forget about caching proxies either, but it might not be possible to have both.

DMWMBot commented 12 years ago

mmascher: Replying to [comment:3 ewv]:

Yeah, the initial idea was just to allow this stuff over http, not https.

Do we put pycurl on the WN? That would be the obvious solution.

As usual, you want to make sure that whatever solution you choose doesn't hold the whole file in memory while downloading it.

I don't know if we have pycurl on the WN. In case we have it, do you suggest to use the {{{downloadFile}}} method in {{{Requests.py}}}, or even the {{{Services/UserFileCache}}} class to do this?

I was planning of directly using {{{urllib.urlretrieve}}} or something similar (like we are doing now). Enabling the proxy usage should be simple.

Finally, don't forget about caching proxies either, but it might not be possible to have both.

What do tou mean with this?

ericvaandering commented 12 years ago

ewv: Replying to [comment:4 mmascher]:

I was planning of directly using {{{urllib.urlretrieve}}} or something similar (like we are doing now). Enabling the proxy usage should be simple.

If that works, great. The problem with all the python libs is that every one has some limitation w.r.t. security, proxies, and/or streaming the files. pycurl is the only thing that seems to escape all these limitations.

Finally, don't forget about caching proxies either, but it might not be possible to have both.

What do tou mean with this?

DMWMBot commented 12 years ago

mmascher: Replying to [comment:5 ewv]:

Replying to [comment:4 mmascher]:

I was planning of directly using {{{urllib.urlretrieve}}} or something similar (like we are doing now). Enabling the proxy usage should be simple.

If that works, great. The problem with all the python libs is that every one has some limitation w.r.t. security, proxies, and/or streaming the files. pycurl is the only thing that seems to escape all these limitations.

It looks like we do not have pycurl on the WNs. I run a job and the {{{import pycurl}}} failed.

Since this is a showstopper for me, I'll proceed using httplib if it is ok for you. I am not sure if it streams the file directly on the disk, however i'll use the very same approach we are using now (hopefully things will not get worse).

I'll let you know about the caching proxies stuff.

sfoulkes commented 12 years ago

sfoulkes: It's possible that we could add pycurl to the python RPM we push down to all the sites.

DMWMBot commented 12 years ago

mmascher: That's the patch for using {{{urllib}}}. As soon as we will have pycurl on WN's we can migrate. But, for the moment, I suggest to start pushing this, otherwise the new UFC will not work.

spigad commented 12 years ago

spiga: (In bf0d123038cd24fd04426da741d7b72ab5ca7fb1) Added key and certificate to Un packUserTarball. Fixes #3479

From: Marco Mascheroni marco.mascheroni@cern.ch