GenericMappingTools / gmtserver-admin

Cache data and script for managing the GMT data server
GNU Lesser General Public License v3.0
7 stars 3 forks source link

rsync the server data #23

Closed PaulWessel closed 4 years ago

PaulWessel commented 5 years ago

Per SOEST IT staff, this is now configured and users need to run

rsync -rP 'gmtserver.soest.hawaii.edu::gmtdata/*' <destination_directory>

where <destination_directory> is the full path to where they want to mirror these files on their local computer. The quotes are needed do to the * wildcard. I just tested this on my Mac and it ran fine. We have replaced the symlinks with actual files and directories. Let me know how this is working for @joa-quim and @seisman now. I notice the files are created with rw for owner only but that is probably a umask setting for me rather than in general. Thus, you may need to do a

chmod -R og+r

to make sure files are readable.

joa-quim commented 5 years ago

Nope, same error

rsync: failed to connect to gmtserver.soest.hawaii.edu (128.171.156.218): Connection timed out (110)

From: Paul Wessel notifications@github.com Sent: Thursday, October 31, 2019 12:19 AM To: GenericMappingTools/gmtserver-admin gmtserver-admin@noreply.github.com Cc: Joaquim Manuel Freire Luís jluis@ualg.pt; Mention mention@noreply.github.com Subject: [GenericMappingTools/gmtserver-admin] rsync the server data (#23)

Per SOEST IT staff, this is now configured and users need to run

rsync -rP 'gmtserver.soest.hawaii.edu::gmtdata/*'

where is the full path to where they want to mirror these files on their local computer. The quotes are needed do to the * wildcard. I just tested this on my Mac and it ran fine. We have replaced the symlinks with actual files and directories. Let me know how this is working for @joa-quimhttps://github.com/joa-quim and @seismanhttps://github.com/seisman now. I notice the files are created with rw for owner only but that is probably a umask setting for me rather than in general. Thus, you may need to do a

chmod -R og+r

to make sure files are readable.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GenericMappingTools/gmtserver-admin/issues/23?email_source=notifications&email_token=AAEDF2L6LD2UPKEDHMM4HJDQRIP65A5CNFSM4JHCEM7KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HVTIRKA, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEDF2MQY22CMJYN3Y2N5STQRIP65ANCNFSM4JHCEM7A.

joa-quim commented 5 years ago

It is working with

rsync -rP 'gmtserver.soest.hawaii.edu:/export/gmtserver/gmt/data/*' /home/jluis/gmtdata

is /export/gmtserver/gmt/data equivalent to :gmtdata? Apparently this is what the rsync man refers as a module and it seems the part that is not working.

BTW, I got an external IP so the mirror can be set. Only need now to set an htpps server running.

I also don't like that much to have the data here under /home/jluis/gmtdata. Any better name, or is this not important at all?

seisman commented 5 years ago

@joa-quim When you ran the command, did you type your account and password?

joa-quim commented 5 years ago

Yes, I had to type the gmtserver passwd.

seisman commented 5 years ago

Then that's not what we want. We can't create accounts on gmt-server for other mirror maintainers.

PaulWessel commented 5 years ago

@seisman, did you also have to provide account and password? I did not when testing from my laptop, and the account on the gmtserver is a local one and not the one on our network. We certainly do not want that and it is meant to be a read-only rsync so not sure where password enters,

seisman commented 5 years ago

rsync -rP 'gmtserver.soest.hawaii.edu::gmtdata/*' gmtdata now works for me on both mac and Linux, without providing account and password.

PaulWessel commented 5 years ago

OK, that is the way it should work. @joa-quim, please check again in case Ross made some very recent changes but there should be no reason for you to give passwords.

joa-quim commented 5 years ago

No. Still times out if I use ::gmtdata

PaulWessel commented 5 years ago

Surely that must point to an issue with firewall or similar on your end? I.e., if it works for @dongdong who is also outside the UH network? Are there mover verbose options or debug options to give rsync to see what exactly is happening?

seisman commented 5 years ago

I tested it on our university's HPCC. To use the HPCC, I need to login to the gateway node first, then login to the dev node. The command above works on the gateway node, but times out on the dev node. Seems like a firewall issue on the dev node for security reasons.

joa-quim commented 5 years ago

Can't be a firewall issue for me because I'm downloading from the exact same machine. The difference is that I'm using the explicit source address instead of ::gmtdata. I don't know what is this ::gmtdata but it clearly looks at least indirectly responsible for the failure to connect.

PaulWessel commented 5 years ago

I don't think you can specify specific directories via the rsync daemon, it needs its special phrase. Otherwise you are just a user logging in to the system. You must use ::gmtdata. My understanding is that this works fine for @dongdong so there cannot be anything wrong with that part. It works for me too. To debug, from the rsync man page:

A single -v will give you information about what files are being transferred and a brief summary at the end. Two -v flags will give you information on what files are being skipped and slightly more information at the end. More than two -v flags should only be used if you are debugging rsync.

maybe you can try

rsync -rP -v -v -v 'gmtserver.soest.hawaii.edu::gmtdata/*' /home/jluis/gmtdata

to see if we can learn something.

joa-quim commented 5 years ago

I might have the port 873 closed. Have to ask.

[jluis@localhost ~]$ rsync -rP -v -v -v 'gmtserver.soest.hawaii.edu::gmtdata/*' /home/jluis/gmtdata
opening tcp connection to gmtserver.soest.hawaii.edu port 873
rsync: failed to connect to gmtserver.soest.hawaii.edu (128.171.156.218): Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(125) [Receiver=3.1.2]
[Receiver] _exit_cleanup(code=10, file=clientserver.c, line=125): about to call exit(10)

And apparently I don't have a disk big enough because got

rsync: mkstemp "/home/jluis/gmtdata/.earth_relief_01d.grd.KuFnFp" failed: No space left on device (28)

Also noticed that /export/gmtserver/gmt/data doesn't have the earth_relief_03s.grd and earth_relief_01s.grd files.

PaulWessel commented 5 years ago

OK, port 873, different form default 22...

There are no earth_relief_03s.grd or earth_relief_01s.grd grids. Those are magic files that leads to SRTM tiles. As for space, the total is ~50 Gb.

joa-quim commented 5 years ago

Yes, died on the beach (6 G too short)

-bash-4.2$ pwd
/export/gmtserver/gmt/data
-bash-4.2$ du -BG -c
41G     ./srtm1
7G      ./srtm3
1G      ./cache
52G     .
52G     total
[jluis@localhost gmtdata]$ pwd
/home/jluis/gmtdata
[jluis@localhost gmtdata]$ du -BG -c
1G      ./cache
41G     ./srtm1
1G      ./srtm3
45G     .
45G     total
remkos commented 5 years ago

@joa-quim: This is run through the rsyncd protocol port (triggered by the double ::). Some IT people block this port. Maybe it does work for you from home (it does for me).

@PaulWessel: Better is to drop the * in the path, and use instead:

rsync -avzP gmtserver.soest.hawaii.edu::gmtdata/ <destination_directory>

(yes, I am an avid rsync user :-)

PaulWessel commented 5 years ago

Thanks @remkos, I will try this from home. I think rsync uses port 873 which presumably @joa-quim's IT folks need to open. But, I think it is good to specify destination directory since people will run this command via chrontab once a night, so either have to add a cd to destination directory first or specify it directly on the command, no?

joa-quim commented 5 years ago

It wouldn't make a difference to run it from home because the machine (a VM) is located inside my University network so the blocked ports would be the same. I reported the issue and asked about port 873 but didn't get any response yet.

remkos commented 5 years ago

Thanks @remkos, I will try this from home. I think rsync uses port 873 which presumably @joa-quim's IT folks need to open. But, I think it is good to specify destination directory since people will run this command via chrontab once a night, so either have to add a cd to destination directory first or specify it directly on the command, no?

The GitHub website killed the in the command, because I did not escape the <> symbols. So yes, destination directory needs to be there. I've edited it above.

PaulWessel commented 4 years ago

Trying the command that @seisman says work, from home:

rsync -rP 'gmtserver.soest.hawaii.edu::gmtdata/*' /Volumes/MacNutRAID/DATA/gmtserver-test rsync: failed to connect to gmtserver.soest.hawaii.edu: Operation timed out (60) rsync error: error in socket IO (code 10) at /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/rsync/rsync-54/rsync/clientserver.c(106) [receiver=2.6.9]

I am trying to relearn this because I just asked UNAVCO if they could be a host for us.

seisman commented 4 years ago

It doesn't work for me any more.

PaulWessel commented 4 years ago

Update: SOEST IT has fixed the issue:

We needed to adjust the firewall rules so port 873 for rsync is going to the correct interface

We may need to remember this so we can quickly diagnoze the problem in the future. Right now I am running this successfully:

rsync -rP 'gmtserver.soest.hawaii.edu::gmtdata/*' /Users/pwessel/test/gmtservertest

seisman commented 4 years ago

Great! The command now works for me on both macOS and Linux.

seisman commented 4 years ago

@PaulWessel The command above doesn't work as we expect. It skips all non-regular files (i.e., symlinks)

rsync -rP 'gmtserver.soest.hawaii.edu::gmtdata/*' gmttest
receiving file list ...
30463 files to consider
created directory gmttest
skipping non-regular file "earth_relief_01d.grd"
skipping non-regular file "earth_relief_01m.grd"
skipping non-regular file "earth_relief_02m.grd"
skipping non-regular file "earth_relief_03m.grd"
skipping non-regular file "earth_relief_04m.grd"
skipping non-regular file "earth_relief_05m.grd"
skipping non-regular file "earth_relief_06m.grd"
skipping non-regular file "earth_relief_10m.grd"
skipping non-regular file "earth_relief_15m.grd"
skipping non-regular file "earth_relief_15s.grd"
skipping non-regular file "earth_relief_20m.grd"
skipping non-regular file "earth_relief_30m.grd"
skipping non-regular file "earth_relief_30s.grd"
skipping non-regular file "earth_relief_60m.grd"
skipping non-regular file "srtm1"
skipping non-regular file "srtm3"
earth_relief_01m_g.grd
joa-quim commented 4 years ago

A little side note. I have a ~100 GB Centos8 ready to serve as an European mirror but probably with severe restrictions on the ports open that, so far will prevent me to have a chron job to do the updates. But before that, how would I do a rsync that would copy the symbolic linked files as such and not make real copies?

seisman commented 4 years ago

I believe we should use this command instead:

rsync -aP --delete 'gmtserver.soest.hawaii.edu::gmtdata/*' gmtdata
       -a, --archive
              This is equivalent to -rlptgoD. It is a quick way of saying you want recursion and  want  to  preserve  almost
              everything  (with  -H  being  a  notable  omission).   The  only  exception  to  the above equivalence is when
              --files-from is specified, in which case -r is not implied.

              Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive.  You must  sepa-
              rately specify -H.

       -P     The  -P  option is equivalent to --partial --progress.  Its purpose is to make it much easier to specify these
              two options for a long transfer that may be interrupted.

       --delete
              This  tells  rsync  to delete extraneous files from the receiving side (ones that aren't on the sending side),
              but only for the directories that are being synchronized.  You must have asked rsync to send the whole  direc-
              tory  (e.g.  "dir"  or  "dir/") without using a wildcard for the directory's contents (e.g. "dir/*") since the
              wildcard is expanded by the shell and rsync thus gets a request to transfer individual files, not  the  files'
              parent  directory.   Files that are excluded from transfer are also excluded from being deleted unless you use
              the --delete-excluded option or mark the rules as only matching on the sending side (see  the  include/exclude
              modifiers in the FILTER RULES section).
seisman commented 4 years ago
rsync -aP --delete gmtserver.soest.hawaii.edu::gmtdata gmtdata

This command also works for me on macOS and Linux.

PaulWessel commented 4 years ago

Yes, agree that seems the best solution.

PaulWessel commented 4 years ago

I am about to tell UNAVCO to set this up. I think we hsould advertise

rsync -aP --delete gmtserver.soest.hawaii.edu::gmtdata /your/server/gmt/data

or something, right. We have forwards like this:

oceania.generic-mapping-tools.org --> http://www.soest.hawaii.edu/gmt/data/

so presumably we will add something like.

unavco.generic-mapping-tools.org --> https://www.unavco.org/wherever/gmt/data

Just noticed our hover forward is now http instead of https. Pretty sure it was https in the beginning, no?

PaulWessel commented 4 years ago

SOEST IT says it should work with https. I think we should do an experiment now: I change the forward to https now, and then we try to get a file from the server. OK, @joa-quim and @seisman ?

joa-quim commented 4 years ago

ok, but soon dinner time (and prepare it first)

seisman commented 4 years ago

OK i can try

PaulWessel commented 4 years ago

I see @joa-quim obsession with food has gone unaffected by the virus.

PaulWessel commented 4 years ago

OK, just updated the forward. It usually is pretty fast at activating but I will wait 10 minutes anyway.

joa-quim commented 4 years ago

From my aquarium image

PaulWessel commented 4 years ago

Seems to work for me

grdinfo [DEBUG]: Get remote file https://oceania.generic-mapping-tools.org/server/earth/earth_relief/earth_relief_01d_p.grd and write to /Users/pwessel/.gmt/server/earth/earth_relief/earth_relief_01d_p.grd

Of course, this was stupid anyway since we have

ConfigDefault.cmake: set (GMT_DATA_SERVER "https://oceania.generic-mapping-tools.org")

Apparently the forward works for both http and https (if the server allows)

PaulWessel commented 4 years ago

We should advertise the following:

  1. Do this once: mkdir -p /your/server/gmt/data
  2. Place this command in crontab: rsync -aP --delete gmtserver.soest.hawaii.edu::gmtdata /your/server/gmt/data

e.g., add this line to the maintainer's crontab file [runs at 1m every night]

1 0 * * * rsync -aP --delete gmtserver.soest.hawaii.edu::gmtdata /your/server/gmt/data > $HOME/cron_0am.log 2>&1

or perhaps we should add 'q' to the flags to avoind the endless logging.

seisman commented 4 years ago

perhaps we should add 'q' to the flags to avoind the endless logging.

Try removing P? rsync -a --delete

PaulWessel commented 4 years ago

If I understand this removes partial progress during the download but probalby means we get a final message. I guess that is fine. So -a --delete.

joa-quim commented 4 years ago

Sorry, so were are we on this? What should I try?

PaulWessel commented 4 years ago

Sees to work, but you could try to

  1. Delete your ~/.gmt/server/earth/earth_relief/earth_relief01d?.grd
  2. Run gmt grdinfo @earth_relief_01d -Vd
joa-quim commented 4 years ago

Not that. I mean, to copy the entire gmtdata without duplicating from the symlinks.

PaulWessel commented 4 years ago
  1. Do this once: mkdir -p /your/server/gmt/data
  2. Run rsync -P --delete gmtserver.soest.hawaii.edu::gmtdata /your/server/gmt/data
seisman commented 4 years ago

2. Run rsync -P --delete gmtserver.soest.hawaii.edu::gmtdata /your/server/gmt/data

No, should be

rsync -a --delete gmtserver.soest.hawaii.edu::gmtdata /your/server/gmt/data
PaulWessel commented 4 years ago

Sorry, copy wrong edit.

joa-quim commented 4 years ago

As I feared. I'm still fckd by the ultra protectionist ports closures.

rsync: failed to connect to gmtserver.soest.hawaii.edu (128.171.156.218): Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(127) [Receiver=3.1.3]
PaulWessel commented 4 years ago

Are they concerned their server will get coronavirus? A server with no ports is very safe and very useless. I suspect they know this.

joa-quim commented 4 years ago

What is gmtdata?

rsync -a --delete  gmtserver.soest.hawaii.edu:gmtdata .
jluis@gmtserver.soest.hawaii.edu's password:
rsync: link_stat "/export/gmtserver/jluis/gmtdata" failed: No such file or directory (2)

is it /export/gmtserver/gmt?

PaulWessel commented 4 years ago

You are missing a colon. ::gmtdata is a special entity in rsync called a module. it is not a file or directory but presumably some metadata.