Closed benthestatistician closed 7 years ago
When I log on to the development rather than the staging branch, #2 isn't a problem but I'm still getting hit with this one:
System.Net.WebException: The remote server returned an error: (403) Forbidden.
@markmfredrickson , @josherrickson : It appears that the R server is giving 403's when the front end tries to submit a job. As @alexsmithRTI put it (by email) when I first observed the problem (while trying to submit to the staging instance):
For issue #3, that looks like the R server is responding with the 403, so I’m going to check that we’re generating the request correctly. Hopefully this is just a configuration issue and it is not generating the request correctly. This might also then explain the other issues.
(I think that last "correctly" was meant to be "incorrectly." The other issues are #1 and #2 on the evaluation-engine-website GH project site.)
Potential source of the problem is my deletion of (what I thought were) deprecated files:
@markmfredrickson
Reverting the commit Josh E references alleviates this problem for the development instance but not the staging instance. See additional notes in this private issue.
@markmfredrickson , @josherrickson -- I've been working with Alex to get staging to the point where we can start runnning tests. This issue seems to be the key one, but Alex doesn't think it's a front-end issue. He emailed me as follows: The error on submission (Step2b) is actually because of the R server not being able to be reached. While a job shows up as started in our database, it hasn’t run because the R server hasn’t done anything with it. Can you look into this from your end? JPEG of error message is attached.
@rstudley: Thanks for relaying this information. I'm working on this today and hopefully will have some follow up soon.
I think I have a reason for the issue. The director /srv/stats/staging/front-end
is not world readable and executable. The directory is owned by Ben and I don't have the ability to change it directly.
@benthestatistician: Can you log into the stats server and run:
chmod a+rx /srv/stats/staging/front-end
Thanks.
Done. (Although maybe I only needed to log in, w/ my chron job doing the rest?)
bhansen.stat@192.168.32.2's password:
Last login: Sat Jan 28 07:26:04 2017 from 192.168.34.123
[bhansen.stat@rserver ~]$ ls -l /srv/stats/staging/front-end
total 24
-rwxrwxrwx. 1 bhansen.stat webadmin 35 Dec 30 13:03 config.ini
-rwxrwxrwx. 1 bhansen.stat webadmin 1950 Dec 30 13:03 job.php
-rwxrwxrwx. 1 bhansen.stat webadmin 397 Dec 30 13:03 start.php
-rwxrwxrwx. 1 bhansen.stat webadmin 310 Dec 30 13:03 status.php
-rwxrwxrwx. 1 bhansen.stat webadmin 200 Dec 30 13:03 test.php
drwxrwsrwx. 2 bhansen.stat webadmin 4096 Dec 30 13:03 tests
[bhansen.stat@rserver ~]$ chmod a+rx /srv/stats/staging/front-end
[bhansen.stat@rserver ~]$ ls -l /srv/stats/staging/front-end
total 24
-rwxrwxrwx. 1 bhansen.stat webadmin 35 Dec 30 13:03 config.ini
-rwxrwxrwx. 1 bhansen.stat webadmin 1950 Dec 30 13:03 job.php
-rwxrwxrwx. 1 bhansen.stat webadmin 397 Dec 30 13:03 start.php
-rwxrwxrwx. 1 bhansen.stat webadmin 310 Dec 30 13:03 status.php
-rwxrwxrwx. 1 bhansen.stat webadmin 200 Dec 30 13:03 test.php
drwxrwsrwx. 2 bhansen.stat webadmin 4096 Dec 30 13:03 tests
Ok. I think I can tentatively say this is fixed on the stats side. I can start a job via the web server on the R server. I haven't tested the full web front end as I forget my password and I think the reminder email service is a current issue. Can someone validate quickly? You can just submit garbage IDs, as along as you get a message from the R server telling the job failed there, we know the problem is fixed.
I have successfully submitted a job to staging!
Actually, what I should say is that I was able to submit the job without receiving an error message. But it's been several minutes, and the Library page still tells me that the job has been "Created". I would have expected it to be "Running" by now. (I have refreshed the page.)
Also, the three jobs I tried to submit yesterday are still reported as "Created". (I got errors when I submitted these, but they nonetheless showed up as "Created" in the Library.) I wasn't necessarily expecting these to be running now, but I guess I was expecting that at least one of yesterday's or today's jobs would be running...
I was able to create a job on development as well (24166d3e-2ce5-4af6-b751-c74ebe4991ea). Before I was seeing errors before this point, @rstudley , so it appears that one obstacle has been removed.
@rstudley I think I can deduce the jobGUID for the job you submitted. I just started that directly and it ran to completion. Did you get any notices?
I'll look into why the job did start directly.
@benthestatistician : Yes, same experience here. One hurdle jumped. @markmfredrickson : JobGUID is 2d55374a-06de-46de-878f-c879cf4b0387. And I have been diligently recording all jobs (with their IDs) here: https://docs.google.com/spreadsheets/d/1hvtGxw9KcJFS2cvKz647qtNQoj9mEt8JrLC1L_x3BCc/edit?pli=1#gid=1860647269
@markmfredrickson : The Library indicates that the job you pushed through (which was the correct job) is now done, and I can view the output. I didn't receive any notices, however, but that seems to be a completely separate issue (see #6). Not sure if you want to do about the staging jobs from yesterday, but their JobGUIDs are: c7e4dabb-6df8-4f65-8bc7-ecfb194b1682 9aab64d2-47c2-49b1-a0bf-9c47653620d6 68251464-b3c2-49bc-af4c-228adfa6a5a3 Thanks for looking into what the job didn't start automatically. Once that is fixed, I can really start putting the staging instance through its paces.
For my own notes, the issue that the front-end/job.php
file was set up to point to the match_master
queue instead of match_staging
. I made the change but jobs still aren't getting started in the proper queue. I see some workers running, but something is not quite right yet. Hopefully, once I figure it out, it will be a quick fix.
Good news: I just submitted a new job and it started running automatically.
For reference, recording @markmfredrickson email comments from yesterday: ...Ok. I think I got this figured out (we were missing a library). ...I think you should be good to submit jobs (though I've said that before...)
@markmfredrickson : Success! Job started automatically AND ran through to completion; jobGUID: 7ffd388c-3d03-4f71-b695-0d190317a849. This issue can be closed, yes? ( @alexsmithRTI still has to get ITS to restore the email notifications however; issue #6 )
This issue can be closed. Other related issues have different causes.
After working around #2 I completed a job specification but got this on submission. Permissions issue?