google-code-export / nmrrestrntsgrid

Automatically exported from code.google.com/p/nmrrestrntsgrid
0 stars 0 forks source link

BMRB network or tang's settings trying to prevent a DoS attack are preventing repeated downloads #143

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Repeat a couple of times:
wget http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
from work or home machine in the Netherlands. It works fine from tang to itself.

I've tried to vary the name tang to it's ip number but to no cause.

Dmitri, could this be caused by tang's apache settings?

I didn't see anything suspect in:
/var/log/httpd and
/etc/httpd/conf/httpd.conf  
I see a lot of access from search engines indexing the servlet_data dir but 
that should be ok.

Can you look into this? Note that is a very serious issue because it prevents 
all BMRB users from 
using data at restraintsgrid.wisc.edu. Fortunatley, the main BMRB site appears 
to be fine:

wget "http://www.bmrb.wisc.edu/cgi-bin/explore.cgi?format=raw\&bmrbId=4096"

works for me 10 times in a row.

Work around is to have this data on nmr.cmbi.ru.nl for now.

Original issue reported on code.google.com by jurge...@gmail.com on 3 Dec 2008 at 9:21

GoogleCodeExporter commented 9 years ago
Note that the same DoS happens from my sourceforge.net account. It works a 
couple of times and then it 
stalls and I have to ctrl-C.

Dmitri, you can log into my account there too, I've send you the password just 
now.

Hope you can trouble shoot this.

jd:Stella/~/ ssh -t jurgenfd,wattos@shell.sourceforge.net create
jurgenfd,wattos@shell.sourceforge.net's password: 

Requesting a new shell for "jurgenfd" and waiting for it to start.
queued... creating... starting...

This is an interactive shell created for user jurgenfd,wattos.
Use the "timeleft" command to see how much time remains before shutdown.
Use the "shutdown" command to destroy the shell before the time limit.
For path information and login help, type "sf-help".

/~/ 
/~/ 
/~/ 
/~/ hostname
shell-21004
/~/ wget
wget: missing URL
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
/~/ wget http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
--09:19:52--  
http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
Connecting to 144.92.167.182:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2935133 (2.8M) [text/plain]
Saving to: `mrfile.txt'

86% 
[===================================================================
========================================>                  ] 2,530,248   
--.-K/s  eta 2s     
86% 
[===================================================================
========================================>                  ] 2,530,248   
--.-K/s  eta 3s     
86% 
[===================================================================
========================================>                  ] 2,530,248   
--.-K/s  eta 3s     
86% 
[===================================================================
========================================>                  ] 2,530,248   
--.-K/s  eta 4s     
/~/ 
/~/ 
/~/ wget http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
--09:20:22--  
http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
Connecting to 144.92.167.182:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2935133 (2.8M) [text/plain]
Saving to: `mrfile.txt.1'

100%[================================================================
=============================================================>] 
2,935,133   3.48M/s   in 0.8s   

09:20:23 (3.48 MB/s) - `mrfile.txt.1' saved [2935133/2935133]

/~/ wget http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
--09:20:26--  
http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
Connecting to 144.92.167.182:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2935133 (2.8M) [text/plain]
Saving to: `mrfile.txt.2'

100%[================================================================
=============================================================>] 
2,935,133   2.27M/s   in 1.2s   

09:20:28 (2.27 MB/s) - `mrfile.txt.2' saved [2935133/2935133]

/~/ wget http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
--09:20:30--  
http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
Connecting to 144.92.167.182:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2935133 (2.8M) [text/plain]
Saving to: `mrfile.txt.3'

77% 
[===================================================================
=============================>                             ] 2,281,760   
--.-K/s  eta 3s     
/~/ wget http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
--09:20:44--  
http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
Connecting to 144.92.167.182:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2935133 (2.8M) [text/plain]
Saving to: `mrfile.txt.4'

100%[================================================================
=============================================================>] 
2,935,133   2.27M/s   in 1.2s   

09:20:45 (2.27 MB/s) - `mrfile.txt.4' saved [2935133/2935133]

/~/ wget http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
--09:20:47--  
http://144.92.167.182/servlet_data/viavia/mr_mysql_backup/mrfile.txt
Connecting to 144.92.167.182:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2935133 (2.8M) [text/plain]
Saving to: `mrfile.txt.5'

89% 
[===================================================================
============================================>              ] 2,619,024   
--.-K/s  eta 1s     

89% 
[===================================================================
============================================>              ] 2,619,024   
--.-K/s  eta 1s     
/~/ 

Original comment by jurge...@gmail.com on 3 Dec 2008 at 9:24

GoogleCodeExporter commented 9 years ago
Dmitri, 
has any progress been made on this issue? 
After next week I need to download huge amounts from BMRB and this issue will 
prevent me from doing so 
unless I use non-traditional techniques like setting up a torrent sharing 
protocol which can deal with these types 
of outages.

I'm bumping up the priority for this issue.

Original comment by jurge...@gmail.com on 15 Jan 2009 at 10:55

GoogleCodeExporter commented 9 years ago
There's no DoS settings on tang that I can find. 

Ever since we switched to campus firewall we're having intermittent problems 
with large
transfers. DoIT is aware of it and has an open ticket. There's nothing more 
that I can
do there.

What I can do is rsync your stuff to a server in CS (that is not behind campus
firewall) -- that usually works after 4 or 6 tries -- and then you can try 
downloading
from there. Send me a list of files and I'll try that.

Original comment by dmitri.m...@gmail.com on 15 Jan 2009 at 6:30

GoogleCodeExporter commented 9 years ago
It's bad again. I get a network stall after 200 Mb. I'll try to setup another 
protocol than the one used by scp now.

jd:nmr/Wattos/ scp 
jurgen@tang.bmrb.wisc.edu:/share/wattos/mr_anno_backup_2009-01-16_unb_.tgz .
mr_anno_backup_2009-01-16_unb_.tgz                                              
       21%  204MB   0.0KB/s - stalled -^CKilled 
by signal 2.

Original comment by jurge...@gmail.com on 16 Jan 2009 at 11:43

GoogleCodeExporter commented 9 years ago
Dmitri, can you enable the ports required for the following command to work 
from tang?

bittorrent-curses /share/wattos/mr_anno_backup_2009-01-16_unb_.torrent

It's public data.

Original comment by jurge...@gmail.com on 16 Jan 2009 at 12:47

GoogleCodeExporter commented 9 years ago
I also tried cutting it up to bits but the download fails after 1.6 Mb now:

jd:nmr/~/ scp 
jurgen@tang.bmrb.wisc.edu:/share/wattos/mr_anno_backup_2009-01-16_unb_\* .
mr_anno_backup_2009-01-16_unb_aa                                                
       16% 1616KB   9.6KB/s - stalled -
^CKilled by signal 2.

That sound like a bug somewhere.

Dmitri, help!

Original comment by jurge...@gmail.com on 16 Jan 2009 at 2:06

GoogleCodeExporter commented 9 years ago
Timestamps on your e-mails correlate very nicely with this:

***** Nagios 2.9 *****

Notification Type: PROBLEM

Service: load-average
Host: tang.bmrb.wisc.edu
Address: 144.92.167.182
State: WARNING

Date/Time: Fri Jan 16 05:07:58 CST 2009

Additional Info:

Load average: 7.02, 5.64, 3.90
*******************************

Unfortunately, I don't see anything in the logs on tang to trace it further, 
but it
does suggest that maybe moving this stuff to a faster machine will help.

Anyway, try

ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz

Original comment by dmitri.m...@gmail.com on 16 Jan 2009 at 5:28

GoogleCodeExporter commented 9 years ago
ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz

fails to open with a message:

Safari can’t open the page 
“ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz”. The 
error 
was: “unknown error” (NSURLErrorDomain:-1) Please choose Report Bugs to 
Apple from the Safari menu, note 
the error number, and describe what you did before you saw this message.

Original comment by jurge...@gmail.com on 16 Jan 2009 at 6:20

GoogleCodeExporter commented 9 years ago
Sorry, my bad. Try again.

Original comment by dmitri.m...@gmail.com on 16 Jan 2009 at 7:39

GoogleCodeExporter commented 9 years ago
I get a 60 kb/s trickle from that site....

If this doesn't work I remember perhaps the reverse, copying from tang to 
outside, might be more stable.

Original comment by jurge...@gmail.com on 21 Jan 2009 at 12:00

GoogleCodeExporter commented 9 years ago
This is incredibly slow at 42 kb/s. I can't debug it further from this side. It 
has to be a Madison specific problem.

/~/ scp /share/wattos/mr_anno_backup_2009-01-16_unb_.tgz 
jd@dodos.dyndns.org:/Users/jd
Password:
/Users/jd/workspace34/wattos/scripts/wsetup: No such file or directory.
mr_anno_backup_2009-01-16_unb_.tgz                                              
                                                                0% 
5856KB  42.3KB/s 6:17:49 ETAKilled by signal 2.

Original comment by jurge...@gmail.com on 21 Jan 2009 at 12:34

GoogleCodeExporter commented 9 years ago
Did you try ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz 
the 
second time? It should work now.

Original comment by dmitri.m...@gmail.com on 21 Jan 2009 at 3:36

GoogleCodeExporter commented 9 years ago
It works but is slow as molasses.. It would take 7 hours to get... at 40 kb/s 
just now.

Original comment by jurge...@gmail.com on 21 Jan 2009 at 3:54

GoogleCodeExporter commented 9 years ago
OK, but this is unlikely to be on our end: we had a user upload a 1.2GB 
timedomain 
set to it and it took about 10 minutes. Also, that machine is directly 
connected to 
the net, no firewall except iptables.

Original comment by dmitri.m...@gmail.com on 21 Jan 2009 at 4:05

GoogleCodeExporter commented 9 years ago
I just tried it with firefox and ftp from home. Firefox reports 245Kb/s (what a 
piece of junk), windows built-in ftp client does about 10MB per minute, which 
is 
about as good as I can get over my DSL connection.

Original comment by dmitri.m...@gmail.com on 21 Jan 2009 at 4:15

GoogleCodeExporter commented 9 years ago
Remains a mystery then. I guess I could let it run overnight...

Good Afternoon jd! Still going strong ?
Today is Wed Jan 21 16:52:45 CET 2009
jd:stella/~/ wget  
ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz 
--16:52:49--  ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz
           => `mr_anno_backup_2009-01-16_unb_.tgz'
Connecting to 128.104.55.250:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /pub/data ... done.
==> PASV ... done.    ==> RETR mr_anno_backup_2009-01-16_unb_.tgz ... done.
Length: 987,420,915 (942M) (unauthoritative)

 5% [===>                                                                         ] 56,599,424    34.92K/s  ETA 6:05:18

Original comment by jurge...@gmail.com on 21 Jan 2009 at 4:15

GoogleCodeExporter commented 9 years ago
I can download at 1 Mb/s in parallel from www.mozilla-europe.org. So it's not 
the internet connection here at 
home.

Original comment by jurge...@gmail.com on 21 Jan 2009 at 4:17

GoogleCodeExporter commented 9 years ago
try ftp instead of wget -- just to make sure.

Original comment by dmitri.m...@gmail.com on 21 Jan 2009 at 4:19

GoogleCodeExporter commented 9 years ago
I use wget.

Original comment by jurge...@gmail.com on 21 Jan 2009 at 4:19

GoogleCodeExporter commented 9 years ago
I get the same speed with all techniques. Just tried Firefox and ftp

jd:stella/~/ ftp 
ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz
Connected to 128.104.55.250.
220 FTP Server ready.
331 Anonymous login ok, send your complete email address as your password
230-
                *** Welcome to BMRB ftp server! ***

     You are user 2 out of a maximum of 10 authorized anonymous logins.
     The current time here is Wed Jan 21 16:24:16 2009.
     If you experience any problems here, contact : web@bmrb.wisc.edu

230 Anonymous login ok, restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
200 Type set to I
250-The subdirectories of this directory contain all the publically 
 available information in BMRB:

 /data is the "main BMRB" data archive.

 /metabolomics is metabolomics data archive.

 /port contains the STAR files for 97 PDB entries that were ported from
 the PDB.

 /software contains software developed at BMRB and third-party programs
 for working with BMRB data in NMR-STAR format (/software/contibuted,
 provided "as is").

 /webdata containd CSV dumps of database tables behind the dynamic pages
 on BMRB website.

 Last updated 29 Aug 2008
250-CWD command successful
 Please read the file README
250    it was last modified on Fri Aug 29 18:10:23 2008 - 144 days ago
250-
 compress: query grid files (http://www.bmrb.wisc.edu/search/query_grid/initial_grid.html).

 nmr-star: published BMRB entries.

 timedomain: NMR time domain data (http://www.bmrb.wisc.edu/data_library/timedomain/).

 export_db: relational database of published BMRB entries (DDL scripts
   and pipe-delimited ascii table dumps).

 (D|R)NA.lib, protein.lib: FASTA libraries of BMRB sequences.
250-CWD command successful
 Please read the file README
250    it was last modified on Sat Nov  3 03:27:39 2007 - 445 days ago
local: mr_anno_backup_2009-01-16_unb_.tgz remote: 
mr_anno_backup_2009-01-16_unb_.tgz
229 Entering Extended Passive Mode (|||19726|)
150 Opening BINARY mode data connection for mr_anno_backup_2009-01-16_unb_.tgz 
(987420915 bytes)
  0% |                                                                               |   585 KB   41.81 KB/s  6:24:07 ETA^C
receive aborted. Waiting for remote to finish abort.
426 Transfer aborted. Data connection closed.

Original comment by jurge...@gmail.com on 21 Jan 2009 at 4:25

GoogleCodeExporter commented 9 years ago
Hmm. So wget does ftp properly, unlike firefox.
I don't know what else we can do. Bittorrent would work if we had the file on 
multiple servers: then it'd download in parallel. With only tang I doubt it'll 
make 
any difference.

Original comment by dmitri.m...@gmail.com on 21 Jan 2009 at 5:01

GoogleCodeExporter commented 9 years ago
Just now trying from the lab nmr.cmbi.ru.nl I get a good speed:

jd:nmr/~/ wget  ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz
--09:45:28--  ftp://128.104.55.250/pub/data/mr_anno_backup_2009-01-16_unb_.tgz
           => `mr_anno_backup_2009-01-16_unb_.tgz'
Connecting to 128.104.55.250:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /pub/data ... done.
==> PASV ... done.    ==> RETR mr_anno_backup_2009-01-16_unb_.tgz ... done.
Length: 987,420,915 (942M) (unauthoritative)

19% [====================>                                                      
                                   ] 194,210,104    
2.00M/s    ETA 04:43

Now how about the same thru scp with tang?
It stops after a few seconds. This is a big problem for tang.

jd:nmr/~/tmp/ scp 
jurgen@tang.bmrb.wisc.edu:/share/wattos/mr_anno_backup_2009-01-16_unb_.tgz .
mr_anno_backup_2009-01-16_unb_.tgz                                              
         0%  848KB  24.9KB/s - stalled -
^CKilled by signal 2.

Original comment by jurge...@gmail.com on 22 Jan 2009 at 8:49

GoogleCodeExporter commented 9 years ago
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
23783 tomcat    20   0 1526m 1.1g  16m S    1 27.6 355:00.40 java
2541  mysql     20   0  907m 243m 3280 S    0  6.1 246:45.71 mysqld
15859 docr      20   0  286m 180m 2376 R   52  4.5   0:23.57 python
15943 docr      20   0 1865m 116m  10m R  140  2.9   0:05.21 java

Note docr processes adding up to 192% of CPU, and it's not the worst: it spikes 
all the
way to 200%. I think your scp simply isn't getting enough CPU to encrypt the 
data at
any kind of speed.

You could try running scp with "-c blowfish-cbc" -- blowfish is the fastest 
cipher,
though it still needs CPU so I doubt it'll make much difference. Wait for 
processing to
finish, then try again. If it still doesn't work we'll know I'm wrong.

Original comment by dmitri.m...@gmail.com on 22 Jan 2009 at 11:46

GoogleCodeExporter commented 9 years ago
Now that the load on the machine is nill this works fine:
jd:stella/~/ scp 
jurgen@tang.bmrb.wisc.edu:/share/wattos/mr_anno_backup_2009-01-16_unb_.tgz .
mr_anno_backup_2009-01-16_unb_.tgz                                              
                  9%   90MB 502.9KB/s   28:55 
ETA^CKilled by signal 2.

Strange. The load of the scp is only a few % every now and then as top reports. 
It might have blocked 
nevertheless before on something cpu intensive. Perhaps it was trying to do a 
larger number of bytes at some 
point. I don't know.

Thanks for solving this D!

Original comment by jurge...@gmail.com on 26 Jan 2009 at 1:54

GoogleCodeExporter commented 9 years ago
Sorry, but I still have a huge problem with the BMRB network with tang.

After trying a couple of successes I get time outs again. Tang isn't busy. My 
machine isn't busy...

Dmitri, what solutions do you see?
 - mount tang's partition in Nijmegen temporarily while I need to sync? 
 - setup a rsync server on tang?

Currently I use a scp from tang which seems to work:
scp -P 39676 $x/$x.tgz  
jd@localhost-nmr:/Library/WebServer/Documents/NRG-CING/tangSync
I need the above complex way of doing it because my Nijmegen machine doesn't 
allow direct access so I first setup a proxy:
ssh -L localhost-nmr:39676:nmr.cmbi.umcn.nl:22 jurgend@www.cmbi.ru.nl

It's a weird setup where the source determines what to write....

Anyway this doesn't work:
jd:nmr/~/ wget 
http://restraintsgrid.bmrb.wisc.edu/servlet_data/NRG_ccpn_tmp/2k0e/2k0e.tgz
--13:52:27--  
http://restraintsgrid.bmrb.wisc.edu/servlet_data/NRG_ccpn_tmp/2k0e/2k0e.tgz
           => `2k0e.tgz'
Resolving restraintsgrid.bmrb.wisc.edu... 144.92.167.182
Connecting to restraintsgrid.bmrb.wisc.edu|144.92.167.182|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11,947,417 (11M) [application/x-gzip]

100%[===========================================================================
==============>] 11,947,417     2.98M/s    ETA 00:00

13:52:34 (1.77 MB/s) - `2k0e.tgz' saved [11947417/11947417]

jd:nmr/~/ wget 
http://restraintsgrid.bmrb.wisc.edu/servlet_data/NRG_ccpn_tmp/2k0e/2k0e.tgz
--13:52:35--  
http://restraintsgrid.bmrb.wisc.edu/servlet_data/NRG_ccpn_tmp/2k0e/2k0e.tgz
           => `2k0e.tgz.1'
Resolving restraintsgrid.bmrb.wisc.edu... 144.92.167.182
Connecting to restraintsgrid.bmrb.wisc.edu|144.92.167.182|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11,947,417 (11M) [application/x-gzip]

75% [===================================================================>       
               ] 9,072,304     --.--K/s    ETA 00:08^C

Is it perhaps because of an apache setting?

Original comment by jurge...@gmail.com on 10 Feb 2009 at 1:20

GoogleCodeExporter commented 9 years ago
This is really annoying; the above work around also fails because a time out 
I'm sure:
Doing 1bq0 218
1bq0.tgz                                                                        
      100%  805KB 804.9KB/s   00:00    
Doing 1bqx 219
1bqx.tgz                                                                        
      100%  301KB 300.5KB/s   00:00    
Doing 1bqz 220
1bqz.tgz                                                                        
      100% 1076KB   1.1MB/s   00:00    
Doing 1brv 221
1brv.tgz                                                                        
      100%  452KB 451.7KB/s   00:00    
Doing 1brz 222
1brz.tgz                                                                        
      100% 1531KB   1.5MB/s   00:01    
Doing 1bsh 223
1bsh.tgz                                                                        
      100% 2236KB   2.2MB/s   00:00    
Doing 1bu9 224
1bu9.tgz                                                                        
      100% 2324KB   2.3MB/s   00:00    

and now it's hanging....

Original comment by jurge...@gmail.com on 10 Feb 2009 at 1:31

GoogleCodeExporter commented 9 years ago
Try running traceroute 144.92.167.182 and see where it's dropping packets.

Original comment by dmitri.m...@gmail.com on 10 Feb 2009 at 3:00

GoogleCodeExporter commented 9 years ago
jd:stella/NRG-CING/ traceroute 144.92.167.182
traceroute to 144.92.167.182 (144.92.167.182), 64 hops max, 40 byte packets
 1  192.168.2.1 (192.168.2.1)  1.204 ms  0.417 ms  0.464 ms
 2  10.219.160.1 (10.219.160.1)  7.040 ms  6.712 ms  7.761 ms
 3  csw2-vlan202.dbsch1.nb.home.nl (213.51.151.193)  8.506 ms  8.458 ms  7.697 ms
 4  tb-rc0001-cr101-ae5-0.core.as9143.net (213.51.158.94)  7.712 ms  9.079 ms  7.014 ms
 5  asd-lc0006-cr101-ae5-0.core.as9143.net (213.51.158.18)  10.964 ms  10.663 ms  9.353 ms
 6  ams-ix.he.net (195.69.145.150)  18.747 ms  13.090 ms  11.614 ms
 7  10gigabitethernet4-1.core1.nyc4.he.net (216.66.24.153)  96.630 ms  95.742 ms  97.733 ms
 8  10gigabitethernet1-2.core1.chi1.he.net (72.52.92.102)  126.275 ms  119.758 ms  118.581 ms
 9  216.66.3.22 (216.66.3.22)  119.706 ms  119.024 ms  119.479 ms
10  r-uwmadison-isp-xe-2-0-0-3997.wiscnet.net (140.189.8.1)  127.893 ms  
131.235 ms  129.984 ms
11  r-peer-xe-1-1-0-945.net.wisc.edu (216.56.60.226)  129.945 ms  131.900 ms  
127.137 ms
12  r-cssc-b280c-9-core-vlan-1627.net.wisc.edu (146.151.175.61)  257.549 ms  
128.377 ms  129.714 ms
13  r-cssc-b280c-1-core-vlan-2034.net.wisc.edu (146.151.167.6)  130.443 ms  
127.908 ms  130.819 ms
14  * * *
15  *^C

Original comment by jurge...@gmail.com on 10 Feb 2009 at 3:16

GoogleCodeExporter commented 9 years ago
that was from development machine.

from Nijmegen it was much shorter:

jd:nmr/~/ traceroute 144.92.167.182
traceroute to 144.92.167.182 (144.92.167.182), 64 hops max, 40 byte packets
 1  d3 (131.174.146.1)  0.737 ms  0.281 ms  0.237 ms
 2  * * *

Original comment by jurge...@gmail.com on 10 Feb 2009 at 3:18

GoogleCodeExporter commented 9 years ago
The second one is probably a firewall on your end.

The first one looks like it's going all the way to our firewall. I'll check the
settings -- I thought I enabled incoming ping/traceroute, but I may be wrong.

I'll kick DoIT network people again... In the meantime, we have a backup 
machine in
CS. Tell me what files need to be backed up, I'll set up a cron job to push them
there, and you can try downloading then from it.

Original comment by dmitri.m...@gmail.com on 10 Feb 2009 at 3:55

GoogleCodeExporter commented 9 years ago
It would be great to rsync them from a rsync tang/BMRB/CS server. 

Shouldn't I simply setup a rsync on tang? Can you open the required ports on 
the firewall?

Otherwise, the partition on tang is:
/big/docr/ccpn_tmp/data/recoord
it's 30G and changes weekly a bit and with every rerun completely.

Original comment by jurge...@gmail.com on 10 Feb 2009 at 4:13

GoogleCodeExporter commented 9 years ago
In theory we don't need to set/open up anything, just run
rsync -a -e 'ssh -l jurgen' tang:/blah .
or something like that. The problem is that rsync is more sensitive to network 
glitches
than any other protocol, so it's more likely to fail than straight scp.

Original comment by dmitri.m...@gmail.com on 10 Feb 2009 at 5:41

GoogleCodeExporter commented 9 years ago
OK, I'll need IP addresses of hosts/networks you'll be using: stella and the 
ones in
Nijmegen.

Original comment by dmitri.m...@gmail.com on 10 Feb 2009 at 5:55

GoogleCodeExporter commented 9 years ago
Yes but it's very easy to restart with rsync figuring out what's left todo.

nmr.cmbi.ru.nl
gets connections from the outside thru:
www.cmbi.ru.nl
unless I ask our admin to poke a hole

Let's not bother about my development machine.
Which side would run the daemon or doesn't it matter?

Thanks

Original comment by jurge...@gmail.com on 10 Feb 2009 at 6:43

GoogleCodeExporter commented 9 years ago
OK, try

rsync -azvv --delete --stats --progress rsync://128.104.55.249/recoord .

it's allowing rsync from 131.174.88.0/24.

The only problem is, I'm copying stuff from tang and it's only down to 2haj now.

Original comment by dmitri.m...@gmail.com on 10 Feb 2009 at 6:55

GoogleCodeExporter commented 9 years ago
Fails from both nmr and www. Can't rsync daemon be on tang?

I'm shortening the spam list to just us 2.

jd:nmr/~/ rsync -azvv --delete --stats --progress 
rsync://128.104.55.249/recoord .
opening tcp connection to 128.104.55.249 port 873
rsync: failed to connect to 128.104.55.249: Connection refused (61)
rsync error: error in socket IO (code 10) at clientserver.c(104) 
[receiver=2.6.9]

[jurgend@www ~]$ rsync -azvv --delete --stats --progress 
rsync://128.104.55.249/recoord .
opening tcp connection to 128.104.55.249 port 873
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(614) 
[receiver=2.6.8]

Original comment by jurge...@gmail.com on 10 Feb 2009 at 7:00

GoogleCodeExporter commented 9 years ago
Sorry, I opened up access in two places, but I had the filters in three -- 
missed one.
It should let you connect now.

As for running it on tang, sure -- but 128.104.55.249 is outside of campus 
firewall and
tang isn't. If it's campus firewall that's screwing it up, rsync from tang 
won't work
any better than http or scp.

Let's try one thing at a time: 128.104.55.249 first.

Original comment by dmitri.m...@gmail.com on 10 Feb 2009 at 7:13

GoogleCodeExporter commented 9 years ago
Weird, it works from www but not from nmr. On www I get:
[jurgend@www ~/recoord]$ rsync -azvv --delete --stats --progress 
rsync://128.104.55.249/recoord .
opening tcp connection to 128.104.55.249 port 873
opening connection using --server --sender -vvlogDtprz . recoord 
receiving file list ... 
...
but on nmr I get:
jd:nmr/recoord/ rsync -azvv --delete --stats --progress 
rsync://128.104.55.249/recoord .
opening tcp connection to 128.104.55.249 port 873
rsync: failed to connect to 128.104.55.249: Connection refused (61)
rsync error: error in socket IO (code 10) at clientserver.c(104) 
[receiver=2.6.9]

What do I tell our sys adm to open up? 

Could you enable the tang server?

I've got a feeling rsyncd will work from it.

Original comment by jurge...@gmail.com on 10 Feb 2009 at 9:19

GoogleCodeExporter commented 9 years ago
# host nmr.cmbi.ru.nl
nmr.cmbi.ru.nl has address 131.174.88.168

If you're getting connection refused on nmr, run ifconfig and see what ip 
address it
actually has: I allow 131.174.88.1..131.174.88.255.

I set it up on tang, you can try it. Probably won't work from nmr since the 
filter is
the same, but let me know if you can connect from www.

Original comment by dmitri.m...@gmail.com on 11 Feb 2009 at 12:42

GoogleCodeExporter commented 9 years ago
The tang server works fine. And it hasn't timed out yet....

Sorry, the machine nmr is internally known as nmr.cmbi.umcn.nl (131.174.146.54) 
as per below on en0.

jd:nmr/~/ ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
    inet 127.0.0.1 netmask 0xff000000 
    inet6 ::1 prefixlen 128 
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    inet6 fe80::21f:5bff:fe3d:bf24%en0 prefixlen 64 scopeid 0x4 
    inet 131.174.146.54 netmask 0xffffff00 broadcast 131.174.146.255
    ether 00:1f:5b:3d:bf:24 
    media: autoselect (100baseTX <full-duplex>) status: active
    supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-
control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX 
<full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT 
<full-duplex> 
1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
en2: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    ether 00:1f:5b:3d:bf:25 
    media: autoselect status: inactive
    supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-
control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX 
<full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT 
<full-duplex> 
1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
en1: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
    ether 00:1d:4f:ff:82:52 
    media: autoselect (<unknown type>) status: inactive
    supported media: autoselect
fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
    lladdr 00:22:41:ff:fe:ed:71:80 
    media: autoselect <full-duplex> status: inactive
    supported media: autoselect <full-duplex>
en3: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
    inet6 fe80::210:32ff:fe54:7698%en3 prefixlen 64 scopeid 0x8 
    inet 10.211.55.2 netmask 0xffffff00 broadcast 10.211.55.255
    ether 00:10:32:54:76:98 
    media: autoselect status: active
    supported media: autoselect
en4: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
    inet6 fe80::201:23ff:fe45:6789%en4 prefixlen 64 scopeid 0x9 
    inet 10.37.129.2 netmask 0xffffff00 broadcast 10.37.129.255
    ether 00:01:23:45:67:89 
    media: autoselect status: active
    supported media: autoselect
jd:nmr/~/ 

Original comment by jurge...@gmail.com on 11 Feb 2009 at 8:16

GoogleCodeExporter commented 9 years ago
This also a problem on tang for incoming traffic. Today, twice the cron job 
/share/jurgen/BMRB/PdbArchive/rsyncPDB.csh failed which Chris and I need for 
the weekly batch.

Even with the rsync I do now today, there were many outages.

Original comment by jurge...@gmail.com on 11 Feb 2009 at 2:57

GoogleCodeExporter commented 9 years ago

Original comment by jurge...@gmail.com on 11 Feb 2009 at 2:57

GoogleCodeExporter commented 9 years ago
I changed tang setup to allow rsync from 131.174.88.160 and 131.174.146.54

Original comment by dmitri.m...@gmail.com on 11 Feb 2009 at 6:31

GoogleCodeExporter commented 9 years ago
Dmitri, can you remount share again this is killing the NRG project.
Why is it dropped for the second time now? Is there a way for me to fix this in 
the future?

df: `/share': Stale NFS file handle

Original comment by jurge...@gmail.com on 6 Mar 2009 at 8:44

GoogleCodeExporter commented 9 years ago

Original comment by jurge...@gmail.com on 9 Mar 2009 at 12:32

GoogleCodeExporter commented 9 years ago
I am unable to work on issue 187 as the rsync drops the connection after a 
couple of entries and we need to do 
50,000. 

Please advise on if we should reboot tang. It worked in the past.
Let me know before you do. I need to check it comes up ok.

Original comment by jurge...@gmail.com on 9 Mar 2009 at 12:37

GoogleCodeExporter commented 9 years ago
We can try rebooting it, let me know when.
You can also try adding "--bwlimit=NN" to rsync command line and try different 
values
of NN -- e.g. 80 works between here and CS, 10 works for large transfers to 
Osaka
(though they take days at that speed).

Original comment by dmitri.m...@gmail.com on 9 Mar 2009 at 5:20

GoogleCodeExporter commented 9 years ago
Would tomorrow morning work for you? As soon as convenient for you. Just let me 
know when it's up again and 
you checked a couple of things. Then I'll check my things; tomcat, apache.

Could you answer my questions in comment 44?

Original comment by jurge...@gmail.com on 9 Mar 2009 at 8:59

GoogleCodeExporter commented 9 years ago
tang's been rebooted, it's back up now

Original comment by dmitri.m...@gmail.com on 10 Mar 2009 at 6:22

GoogleCodeExporter commented 9 years ago
re #44: it turns out openldap keeps connections open too long. At some point it 
ran out
of open descriptors and stopped answering auth queries. I restarted it. I still 
can't
quite understand a) how they managed to make it so it runs out again 48 hours 
later (it
was fine for close to a year until now), and b) why it would cause stale nfs 
handles:
nfs is mounted by root who is not authenticated via ldap...
But anyway, I added a connection timeout on ldap server, so it shouldn't happen 
again.
Computers that use automounter weren't affected, but tang doesn't since in 
fedora 6
automount was too buggy.

To answer your question: there is no way for you to fix this. Something new 
always
breaks.

Original comment by dmitri.m...@gmail.com on 10 Mar 2009 at 6:35