Closed szanati closed 6 years ago
Xymon reported that both thin listener and thin server went down at 06:10:18 am on Sat Jul 14. There was one package, EXT0SNHTM_47ZCLP, that did die with the above error but packages kept on archiving until 13:33:18 pm on Sat Jul 14 when I guess the description died. I stopped and restarted DAITSS this morning 9ish Jul 16 and reset the packages. It seems to be working except for the occasional dead process.
The first package that to die in the above post, EXT0SNHTM_47ZCLP, had the following error that was different then the other errors:
error while processing 3599(sip-files/00359.txt): bad status http://describe.fda.fcla.edu/describe?location=file:/var/daitss/data/work/EXT0SNHTM_47ZCLP/files/original/3599/data&uri=info%3Afda%2FEXT0SNHTM_47ZCLP%2Ffile%2F3599&originalName=sip-files%2F00359.txt: 502 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
The proxy server received an invalid
response from an upstream server.
The proxy server could not handle the request GET /describe.
Reason: Error reading from remote server
Here is an IEID of a package that has died in the describe step: EBM0ZL3MU_QGUTN1
Okay, I will take a look.
Stephen - are you saying that for EBM0ZL3MU_QGUTN1 the ingest process got a stack trace and died during the description step? I believe that there have been a number of other processes that have died lately well after passing the description step. Can you confirm? That may be a separate issue.
Package EBM0ZL3MU_QGUTN1 did die in the description step it had the following error:
error while processing 197(sip-files/00007.jpg): bad status http://describe.fda.fcla.edu/describe?location=file:/var/daitss/data/work/EBM0ZL3MU_QGUTN1/files/original/197/data&uri=info%3Afda%2FEBM0ZL3MU_QGUTN1%2Ffile%2F197&originalName=sip-files%2F00007.jpg: 503 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
The other packages that died probably are a different issue.
Jame and I downloaded this problem file and we cannot reproduce this error. The description service is able to process the file, 00007.jpg successfully. Looks like this is probably one of the intermittent error due to description service being busy on serving all the requests.
Thanks. Then there seems to be a separate issue with dead processes that result in a stack trace. Stephen, could you create a separate issue for packages that die with a stack trace error? I saw one yesterday that died after the digiprov step, and from our conversation yesterday it sounded like you were seeing multiple packages erroring out with errors other than the unavailability of description service.
From: szanati [mailto:notifications@github.com] Sent: Wednesday, July 18, 2018 4:20 PM To: daitss/core core@noreply.github.com Cc: Lydia Motyka LMotyka@flvc.org; Comment comment@noreply.github.com Subject: Re: [daitss/core] Description Service Issues (#803)
Package EBM0ZL3MU_QGUTN1 did die in the description step it had the following error:
error while processing 197(sip-files/00007.jpg): bad status http://describe.fda.fcla.edu/describe?location=file:/var/daitss/data/work/EBM0ZL3MU_QGUTN1/files/original/197/data&uri=info%3Afda%2FEBM0ZL3MU_QGUTN1%2Ffile%2F197&originalName=sip-files%2F00007.jpghttps://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdescribe.fda.fcla.edu%2Fdescribe%3Flocation%3Dfile%3A%2Fvar%2Fdaitss%2Fdata%2Fwork%2FEBM0ZL3MU_QGUTN1%2Ffiles%2Foriginal%2F197%2Fdata%26uri%3Dinfo%253Afda%252FEBM0ZL3MU_QGUTN1%252Ffile%252F197%26originalName%3Dsip-files%252F00007.jpg&data=02%7C01%7Clmotyka%40flvc.org%7C2f9556a702a54c17392908d5ecebcc26%7C60ebd441a2f94841802f22bf1380b4ae%7C0%7C0%7C636675419855880040&sdata=4Ip1Pu5H28TzPFBfBzdp1L8hSwDGkohKaYmW0j%2FG2so%3D&reserved=0: 503
Service Temporarily Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
Apache Server at describe.fda.fcla.edu Port 80
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdaitss%2Fcore%2Fissues%2F803%23issuecomment-406061158&data=02%7C01%7Clmotyka%40flvc.org%7C2f9556a702a54c17392908d5ecebcc26%7C60ebd441a2f94841802f22bf1380b4ae%7C0%7C0%7C636675419855880040&sdata=YTp1w5tJUr2ibMHSPpFXgpvGH6RxkyN6qzsJzpv7diM%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAASqrX0A7qXDlr4QaWobjLqNNVlwaWHDks5uH5hdgaJpZM4VRi1K&data=02%7C01%7Clmotyka%40flvc.org%7C2f9556a702a54c17392908d5ecebcc26%7C60ebd441a2f94841802f22bf1380b4ae%7C0%7C0%7C636675419855880040&sdata=7eB0cG2BdHI1JnOtPoVtF7LLbhqjxoilXeXWYjcA2Ks%3D&reserved=0.
The two packages, E1YVHJD3K_3JVRTT and EXT0SNHTM_47ZCLP, at the top of this issue, both finally archived. I have been searching and so far I have only found packages that died at the ingest digiprov step. I believe I saw one in the tar step. Its possible that the packages we are thinking of finally archived. I have been looking at resets. Also when I find one of the ingest digiprov packages since I reset it the error message is gone. I might have to wait for another dead process to happen to put a new issue in. I will continue to hunt for one.
As this does not appear to be application error, close for now.
Over the weekend the Description Service went down and I had several hundred packages that errored with the following message the only difference between the messages was the (sip-files/) name would be different for each package.
error while processing 14(sip-files/0328.alto): bad status http://describe.fda.fcla.edu/describe?location=file:/var/daitss/data/work/E1YVHJD3K_3JVRTT/files/original/14/data&uri=info%3Afda%2FE1YVHJD3K_3JVRTT%2Ffile%2F14&originalName=sip-files%2F0328.alto: 503 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
Service Temporarily Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
Apache Server at describe.fda.fcla.edu Port 80
I reset the packages and they are currently ingesting. I also noticed that several of the packages would also die in the describe step. I didn't know if maybe the above error along with the dead packages had anything to do with the recent upgrade of Jhove.