Closed DMWMBot closed 11 years ago
mmascher: I plan to work on this taking the proxy from the {{{$X509_USER_PROXY}}} variable.
ewv: Yeah, the initial idea was just to allow this stuff over http, not https.
Do we put pycurl on the WN? That would be the obvious solution.
As usual, you want to make sure that whatever solution you choose doesn't hold the whole file in memory while downloading it.
Finally, don't forget about caching proxies either, but it might not be possible to have both.
mmascher: Replying to [comment:3 ewv]:
Yeah, the initial idea was just to allow this stuff over http, not https.
Do we put pycurl on the WN? That would be the obvious solution.
As usual, you want to make sure that whatever solution you choose doesn't hold the whole file in memory while downloading it.
I don't know if we have pycurl on the WN. In case we have it, do you suggest to use the {{{downloadFile}}} method in {{{Requests.py}}}, or even the {{{Services/UserFileCache}}} class to do this?
I was planning of directly using {{{urllib.urlretrieve}}} or something similar (like we are doing now). Enabling the proxy usage should be simple.
Finally, don't forget about caching proxies either, but it might not be possible to have both.
What do tou mean with this?
ewv: Replying to [comment:4 mmascher]:
I was planning of directly using {{{urllib.urlretrieve}}} or something similar (like we are doing now). Enabling the proxy usage should be simple.
If that works, great. The problem with all the python libs is that every one has some limitation w.r.t. security, proxies, and/or streaming the files. pycurl is the only thing that seems to escape all these limitations.
Finally, don't forget about caching proxies either, but it might not be possible to have both.
What do tou mean with this?
mmascher: Replying to [comment:5 ewv]:
Replying to [comment:4 mmascher]:
I was planning of directly using {{{urllib.urlretrieve}}} or something similar (like we are doing now). Enabling the proxy usage should be simple.
If that works, great. The problem with all the python libs is that every one has some limitation w.r.t. security, proxies, and/or streaming the files. pycurl is the only thing that seems to escape all these limitations.
It looks like we do not have pycurl on the WNs. I run a job and the {{{import pycurl}}} failed.
Since this is a showstopper for me, I'll proceed using httplib if it is ok for you. I am not sure if it streams the file directly on the disk, however i'll use the very same approach we are using now (hopefully things will not get worse).
I'll let you know about the caching proxies stuff.
sfoulkes: It's possible that we could add pycurl to the python RPM we push down to all the sites.
mmascher: That's the patch for using {{{urllib}}}. As soon as we will have pycurl on WN's we can migrate. But, for the moment, I suggest to start pushing this, otherwise the new UFC will not work.
spiga: (In bf0d123038cd24fd04426da741d7b72ab5ca7fb1) Added key and certificate to Un packUserTarball. Fixes #3479
From: Marco Mascheroni marco.mascheroni@cern.ch
Currently the {{{UnpackUserTarball.py}}} script just seems to do a bare {{{urllib.retrieve}}} which does not authenticate in case the https protocol is used.
See setHttpProxy() in UnpackUserTarball. It uses frontier to detect caching proxies that may be in place. The correct environment setting for secure URLs is https_proxy, but I'm not sure that will work if you are authenticating with the a certificate.