Chicago / west-nile-virus-predictions

Algorithm to predict repeated positive results for West Nile Virus for mosquitoes captured in traps across Chicago.
MIT License
14 stars 1 forks source link

WNV updates are failing #37

Closed tomschenkjr closed 7 years ago

tomschenkjr commented 7 years ago

The updates failed on 2017-07-08 and on 2017-07-10. There doesn't seem to be an attempt to run the predictions on 2017-07-09 (a Sunday).

The 2017-07-08 provided this error

FAIL - R/10_calculate_idtable.R
FAIL - R/21_create_features.R
FAIL - R/40_upload_predictions_ROracle.R

And for 2017-07-10:

SUCCESS - R/10_calculate_idtable.R
FAIL - R/21_create_features.R
FAIL - R/40_upload_predictions_ROracle.R
geneorama commented 7 years ago

The main bug right now is that libcurl can't access ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/2017/725300-94846-2017.gz. This error occurs in download_noaa_hourly which was introduced in the NOAA download issue discussed in #34.

I think that the problem is that the FTP port is blocked, although I'm not sure how to test this. Based on what I've read FTP is mainly port 21. I tried manually transferring the files, but ftp isn't installed on the analytics servers.

tomschenkjr commented 7 years ago

Use telnet over port 21 from the command line in the linux server. If it’s a firewall issue, then you’ll get a network time out.

geneorama commented 7 years ago

I tried telnet

$ telnet ftp://ftp.ncdc.noaa.gov/ 21
telnet: ftp://ftp.ncdc.noaa.gov/: Name or service not known
ftp://ftp.ncdc.noaa.gov/: Unknown host
tomschenkjr commented 7 years ago

Try without the “ftp://” prefix.

geneorama commented 7 years ago

Tried that too

$ telnet ftp.ncdc.noaa.gov/ 21
telnet: ftp.ncdc.noaa.gov/: Name or service not known
ftp.ncdc.noaa.gov/: Unknown host
geneorama commented 7 years ago

After some extensive troubleshooting, we discovered that it is a firewall issue. It was confusing that the telnet ftp.ncdc.noaa.gov 21 was successful, but wget and curl failed. As I now understand it, we have different virtual firewalls, and the telnet command was probably being handled differently (not blocked or blocked at a different level).

I will put in a firewall request into the tufin system.

geneorama commented 7 years ago

It took longer to upload the predictions manually.

Some lessons learned:

I ran the data acquisition files locally, uploaded the two .Rds files that are produced, then ran the model and database update code on the analytics servers.

tomschenkjr commented 7 years ago

RODBC or ROracle? I thought you were using the latter.

geneorama commented 7 years ago

I'm using ROracle on the linux servers.

RODBC is no longer a good way to update the predictions

Also, it was good to have a dev database to test the update first. When I tried updating the database with RODBC it would drop the tables, but then wouldn't write new ones. So, I would have messed up the prod server (temporarily) if I wouldn't have had access to the dev database.

geneorama commented 7 years ago

I put in the tufin request earlier this afternoon.

geneorama commented 7 years ago

Firewall has been opened.