fr4nk5ch31n3r / gtransfer

GridFTP transfers made easy!
GNU General Public License v3.0
7 stars 3 forks source link

Gtransfer - GridFTP transfers made easy!

Description

Gtransfer is a wrapper script for tgftp (which itself wraps globus-url-copy) and also uses functionality of uberftp. Gtransfer provides an advanced command line interface for performing GridFTP data transfers. The primary aim of gtransfer is to make GridFTP data transfers on the command line as easy as possible for the user. Therefore a user only has to provide the source and the destination to perform a data transfer:

$ gt -s <SOURCE> -d <DESTINATION>

Features

Multi-step data transfers

Gtransfer can transfer files along predefined paths by using transit sites and can therefore bridge different network domains.

Example:

$ gt -s host1:/files/* -d host3:/files/
................

NOTICE: This examples uses two host aliases - host1: and host3: - which can point to ordinary host addresses like gsiftp://host1.domain.tld:2811.

multi-step transfer

The host host1 is located in a private network, host3 is located in the Internet and host2 has connections to both networks. To transfer files from host1 to host3 gtransfer copies the files to the transit host host2 (first step) and afterwards from host2 to host3 (second step). After the transfer has finished temporary files are removed from host2. See dpath(5) for details.

Data transfer using multipathing

Gtransfer can distribute a data transfer over multiple paths. This way users can benefit from the combined bandwidth of multiple paths.

Example

$ gt -s host1:/file/* -d host3:/files/ -m all
010101011

transfer using multipathing

The host host1 has connections to both the Internet and a private network. The bandwidth of the Internet connection is limited to 1 Gb/s, but the connection to the private network has a bandwidth of 10 Gb/s. The host host2 has a bandwidth of 10 Gb/s on connections to both the Internet and the private network. In effect there are two paths available from host1 to host3, one direct path and one indirect path using host2 as transit site. With multipathing, instead of using only one path, both paths can be used to combine the available bandwidth. To distribute a data transfer over those two paths, gtransfer splits the list of files to be transferred into two lists according to the bandwidth proportions taking into account the file size. I.e. the connection with the greater bandwidth will transfer a greater amount of the total file size of the data transfer than the other connection.

NOTICE: Because the second path uses a transit site and needs two transfer steps to complete, the effective bandwidth is lower than the bandwidth of the used connections.

Optimized data transfer performance

Another aim of gtransfer is to allow well-performing data transfers without detailed knowledge about the underlying facilities. Therefore gtransfer supports usage of pre-optimized data transfer parameters for specific connections. See dparam(5) for details. In addition gtransfer can also automatically optimize a data transfer depending on the size of the files.

Data transfer interruption and continuation

Gtransfer supports interruption and continuation of transfers. You can interrupt a transfer by hitting CTRL+C. To continue an interrupted transfer simply issue the very same command, gtransfer will then continue the transfer where it was interrupted. The same procedure also works for a failed transfer.

Data transfer reliability

Gtransfer supports automatic retries of failed transfer steps. The number of retries is configurable. See gtransfer(1) for details.

Bash completion

Gtransfer makes use of bash completion to ease usage. This supports completion of options and URLs. URL completion also expands (remote) paths directly on the command line. Just hit the TAB key to see what's possible.

Host aliases

Gtransfer can use host aliases as alternatives to host addresses. E.g. a user can use myGridFTP: and gsiftp://host1.domain.tld:2811 synonymically. See host aliases for more details.

Persistent identifiers (PIDs)

Gtransfer can use persistent identifiers (PIDs) as used by EUDAT and provided by EPIC as source of a data transfer. See persistent identifiers for more details.

Examples

As said, the primary aim of gtransfer is to make GridFTP data transfers on the command line as easy as possible for the user. Therefore the simple example in the description should be already suitable for most users.

You can find more detailed examples in the gtransfer wiki on GitHub. Additional examples will be made available occasionally.

Who is using it?

This is a list of HPC centers in Europe that use gtransfer in production:

HLRS logo

Höchstleistungsrechenzentrum Stuttgart (HLRS - Germany)


CSC logo

CSC - IT Center for Science (CSC - Finland)


LRZ logo

Leibniz-Rechenzentrum (LRZ) der Bayerischen Akademie der Wissenschaften (LRZ - Germany)


ICHEC logo

Irish Centre for High-End Computing (ICHEC - Ireland)


CINECA logo

Centro di supercalcolo, Consorzio di università (CINECA - Italy)


SURFSARA logo

SURFsara (SURFsara - The Netherlands)


CINES logo

Centre Informatique National de l’Enseignement Supérieur (CINES - France)


IT4Innovations logo

IT4Innovations national supercomputing center (IT4Innovations - Czech republic)


KIT logo

Karlsruhe Institute of Technology (KIT - Germany)

License

(GPLv3)

Copyright (C) 2010, 2011, 2013-2017 Frank Scheiner, HLRS, Universitaet Stuttgart
Copyright (C) 2011, 2012, 2013 Frank Scheiner

The software is distributed under the terms of the GNU General Public License

This software is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.