Open kosack opened 1 year ago
downside of course would be one more dependency, which maybe is not really needed (file access may only be via local files for the most part).
Regarding this a possible use case is that of magic-cta-pipe, where data access happens via SSH tunnelings to some container at La Palma.
As far as I know the Python3 standard library doesn't provide API to deal with SSH. Maybe one can use subprocess, but it could present some safety issues.
An alternative seems to be Paramiko - anyway I think supporting something else than HTTP will add a new dependency (but we could define it as an extra...).
By the way, I think I can safely assume that the current API in utils
doesn't support SSH tunneling, am I right?
@Elisa-Visentin
SSH is for shells. It does not deal with files. There are some tools which build upon ssh to transfer files (e.g. rsync, scp, sftp) or to mount files on a remote server (sshfs).
Tunneling is yet a different concept (exposing ports / jumping multiple hosts).
I don't see how that directly relates to input to ctapipe. Could you ofer some clarification?
I don't see how that directly relates to input to ctapipe. Could you ofer some clarification?
Yes, probably this issue is about something yet different. Seeing "SSH" as one of the available protocols I thought it might have to do also with the actual connection - am I wrong?
As a "corollary", I wanted to understand if ctapipe.utils
plans to support SSH tunneling to e.g. get test data from a server - I have the feeling that it works only via HTTPS.
@HealthyPear you can support multiple ssh jumps transparently just by editing your .ssh/config
and adding an appropriate entry. For example, see here https://www.redhat.com/sysadmin/ssh-proxy-bastion-proxyjump. So after setting that up, you don't need support that in software. With such an entry (e.g. ProxyJump), you can then scp from that machine as if it wasn't going through another intermediate machine.
Or if you use a real tunnel, i.e. remapping a port to a local one, then we wiould have to support the port part of the URL (which we currently do not I think), but with something like fsspec that is supported.
(which we currently do not I think)
We just pass the URL to requests
, so it will support non-standard ports
I think we probably should do this, as this will also enable us to avoid copying files to worker nodes on the grid if we can directly read from protocols like root:// or others that are supported by the storage elements / DIRAC.
Dirac has the option to either download files for the job and put them into the current directory or just to provide a url.
Please describe the use case that requires this feature. Since astropy now optionally uses the fsspec library for open local and remote FITS files, and this library is also used by pandas and many others, it might be useful to replace parts of ctapipe's URL functionality (
ctapipe.utils.download
andctapipe.utils.download_cached
) with it.It supports opening files from many filesystem sources (see list below) and using many compression methods, and supports caching as we use now.
Additional context
e.g.:
Available support: