Closed lydiam closed 12 years ago
See https://prb.fcla.edu/rt3/Ticket/Display.html?id=15664 for discussion.
NOTE: a prerequisite for running orphan-remover is to turn the "delete" method on in the target silo(s). Otherwise orphan-remover won't delete.
From Manny's email:
Orphan remover is ready for ops testing on ripple. Here's how to run it:
First, make sure /opt/fda/bin is in your path. You can add this line to your .bashrc file in your home directory:
export PATH=/opt/ruby/bin:/opt/fda/bin:$PATH
To run orphan remover, type: sudo -u daitss orphan-remover. It should run from anywhere, but you might get a permissions error if you run it from a place the daitss user doesn't have read access, like your home directory.
Usage
The usage print out looks like this: Usage: orphan-remover [options] Deletes Orphans from D2 Storage --file FILE Path to file containing list of orphan URLs to delete from D2 storage, either --file or --url required --url URL URL of a single orphan to delete from D2 storage, either --url or --file required --username USERNAME Operations agent username, required --password PASSWORD Operations agent password, required --note NOTE Note (should be in quotes) --help
Whatever you type in --note ends up in a field in a new column i added to the storage-master database called logs, where all deletes are logged.
You can specify --file (path to a file containing a list of URLs) or --url (a single url) but not both.
On ripple, orphan-remover is not saving text specified in --note flag.
My command:
[lydiam@ripple daemons]$ sudo -u daitss orphan-remover --url http://silos.ripple.fcla.edu:70/006/data/EH146IYZR_IEXCMW.001 --username lydia2 --password lydia2 --note "testing orphan-remover"
[sudo] password for lydiam:
2012-01-31 11:21:58 INFO Executing: curl -sv -X DELETE http://silos.ripple.fcla.edu:70/006/data/EH146IYZR_IEXCMW.001 2>&1
2012-01-31 11:21:58 INFO Resource at http://silos.ripple.fcla.edu:70/006/data/EH146IYZR_IEXCMW.001 successfully deleted
The resulting database row:
14 | 2012-01-31 11:21:58 | Delete orphan | lydia2 | URL: http://silos.ripple.fcla.edu:70/006/data/EH146IYZR_IEXCMW.001
Manny is storing the deleted URL in the note field and the note supplied with --note is not stored. I see no notes stored on ripple in the logs table.
I have added an url column to the log table and recorded both the url and note. The new code has just been rollout to ripple. Please test again.
On ripple it appears to be working:
15 | 2012-01-31 14:14:16 | Delete orphan | lydia2 | Will this note be stored in the database? | http://silos.ripple.fcla.edu:70/007/data/EQG0KTVFK_X0LQQN.000
I'm not sure where the following lines in the storage.log are coming from:
Jan 31 14:14:16 ripple SiloPool[22489]: INFO silos.ripple.fcla.edu: Rack: 128.227.228.33 - - [31/Jan/2012 14:14:16] "DELETE /007/data/EQG0KTVFK_X0LQQN.000 " 204 - 0.0554
Jan 31 14:16:35 ripple SiloPool[22489]: WARN silos.ripple.fcla.edu: 404 Not Found - The resource http://silos.ripple.fcla.edu:70/007/data/EQG0KTVFK_X0LQQN.000: does not exist. 128.227.228.184 - - "GET /007/data/EQG0KTVFK_X0LQQN.000: HTTP/1.1"
Jan 31 14:16:35 ripple SiloPool[22489]: WARN silos.ripple.fcla.edu: 404 Not Found - The resource http://silos.ripple.fcla.edu:70/007/data/EQG0KTVFK_X0LQQN.000: does not exist. 128.227.228.184 - - "GET /007/data/EQG0KTVFK_X0LQQN.000: HTTP/1.1"
Jan 31 14:16:35 ripple SiloPool[22489]: INFO silos.ripple.fcla.edu: Rack: 128.227.228.184 - - [31/Jan/2012 14:16:35] "GET /007/data/EQG0KTVFK_X0LQQN.000: " 404 108 0.0559
Jan 31 14:17:48 ripple SiloPool[22489]: INFO silos.ripple.fcla.edu: Rack: 128.227.228.184 - - [31/Jan/2012 14:17:48] "GET /007/data/?search=EQG0KTVFK_X0LQQN&page=1 " 200 903 0.0149
Withdrawal (with delete method turned off) was completed at: withdraw finished Tue Jan 31 2012 02:11:36 PM
It almost seems as if the orphan-remover caused these lines to be added. I'll do a controlled test on another orphan.
I tried to delete a package that is listed in silo /007 but isn't retrieved from the daitss database and got the following error:
[lydiam@ripple web-services]$ sudo -u daitss orphan-remover --url http://silos.ripple.fcla.edu:70/007/data/E20100617_AAAASU.000 --username lydia2 --password lydia2 --note "Will this note be stored in the database as well?"
[sudo] password for lydiam:
/opt/web-services/sites/storage-master/current/tools/orphan-remover:112:in `process_url': undefined local variable or method `name' for #<Object:0x2ab7c62662b0> (NameError)
from /opt/web-services/sites/storage-master/current/tools/orphan-remover:175
daitss_db=> select * from events where package_id='E20100617_AAAASU';
id | name | timestamp | notes | outcome | package_id | agent_id
----+------+-----------+-------+---------+------------+----------
(0 rows)
So maybe this is an "alien". I believe that true orphans always have daitss_db data. This should never be one of our use cases. (There must be cruft in silos left behind when the database was reinitialized at some point.)
The first entry in your log is created by the orphan-remover's call to silo-pool to delete the resource. Jan 31 14:14:16 ripple SiloPool[22489]: INFO silos.ripple.fcla.edu: Rack: 128.227.228.33 - - [31/Jan/2012 14:14:16] "DELETE /007/data/EQG0KTVFK_X0LQQN.000 " 204 - 0.0554
Those entries after that are probably generated by the silo interface.
I believe that you're right: that those WARN lines were caused by me refreshing the silo display of a package that had been removed. I removed another orphan, ET928BMPD_OUKDG1.000, then searched the silo interface for that package again and I don't see the WARN in storage.log.
So if we don't want XYMON sending us emails about storage.log WARN messages we should not refresh the detailed display from the silo interface of a package after it's been removed.
I'm satisfied that orphan-remover works correctly on ripple.
Note from my communication with Manny this morning --
Orphan-remover doesn't log to file, it only output to standard error. So if the operation does a batch orphan-remover, it would be the best to save the orphan-remover output.
A possible enhancement: when orphan-remover is run against a silo that doesn't have "delete" method turned on it doesn't report lack of success, it simply notes that it's issuing a curl command. It might be best if orphan-remover were to issue a "WARN" or "ERROR" line each time it attempts to remove a package but fails for any reason. For example, when silo /006 doesn't have delete turned on, all I get is:
[lydiam@ripple daitss]$ sudo -u daitss orphan-remover --url http://silos.ripple.fcla.edu:70/006/data/EZG4Z2SG8_WAXAGO.000 --username lydia2 --password lydia2 --note "Will this note be stored in the database as well2?"
[sudo] password for lydiam:
2012-01-31 15:11:23 INFO Executing: curl -sv -X DELETE http://silos.ripple.fcla.edu:70/006/data/EZG4Z2SG8_WAXAGO.000 2>&1
However, the storage.log says:
Jan 31 15:11:23 ripple SiloPool[22479]: ERROR silos.ripple.fcla.edu: 405 Method Not Allowed - DELETEs are not currently allowed on silo 006 - you must enable them to delete EZG4Z2SG8_WAXAGO.000. 128.227.228.33 - - "DELETE /006/data/EZG4Z2SG8_WAXAGO.000 HTTP/1.1"
I can't think of a use case where some orphans would be deleted and others not, perhaps some are orphans and some are not, as in this example of a package that isn't an orphan and gives a proper error message:
[lydiam@ripple web-services]$ sudo -u daitss orphan-remover --url http://silos.ripple.fcla.edu:70/008/data/EP5TDN9YF_5Y1RT2.000 --username lydia2 --password lydia2 --note "Will this note be stored in the database as well2?"
[sudo] password for lydiam:
2012-01-31 15:17:11 ERROR Skipping http://silos.ripple.fcla.edu:70/008/data/EP5TDN9YF_5Y1RT2.000: It appears to be in the DAITSS copy table as: http://storage-master.ripple.fcla.edu:70/packages/EP5TDN9YF_5Y1RT2.000
This is the message when the URL isn't found:
[lydiam@ripple web-services]$ sudo -u daitss orphan-remover --url http://silos.ripple.fcla.edu:70/008/data/EP5TDN9YF_5Y1RT2 --username lydia2 --password lydia2 --note "Will this note be stored in the database as well2?"
[sudo] password for lydiam:
2012-01-31 15:19:31 INFO Executing: curl -sv -X DELETE http://silos.ripple.fcla.edu:70/008/data/EP5TDN9YF_5Y1RT2 2>&1
2012-01-31 15:19:31 INFO Resource at http://silos.ripple.fcla.edu:70/008/data/EP5TDN9YF_5Y1RT2 not found, 404 returned when attempting to delete
It might be best if all cases where a package hasn't been deleted included the word ERROR in them, to make it easier to identify all problems in the output log, but we can probably live with the program as is. Collect-fixities would always list all orphans, so if some were missed one day they could be deleted the next.
A possible enhancement: when orphan-remover is run against a silo that doesn't have "delete" method turned on it doesn't report lack of success, it simply notes that it's issuing a curl command. It might be best if orphan-remover were to issue a "WARN" or "ERROR" line each time it attempts to remove a package but fails for any reason.
The issue was because the orphan-remover did not catch the '405 Method not allow", I have changed the orpahn-remover to catch this error.
Also, all the 4xx return codes (such as 'not found', 'gone', and 'method not allowed') changed to log as an error.
The new enhanced code has been released to ripple, please test again.
Carol,
It still doesn't seem to catch the 405:
[lydiam@ripple daemons]$ sudo -u daitss orphan-remover --username lydia2 --password lydia2 --url http://silos.ripple.fcla.edu:70/006/data/E9I5FTQ01_XASQA4.000 --note "testing new error message" [sudo] password for lydiam: 2012-02-01 09:26:49 INFO Executing: curl -sv -X DELETE http://silos.ripple.fcla.edu:70/006/data/E9I5FTQ01_XASQA4.000 2>&1
From storage.log:
Feb 1 09:26:49 ripple SiloPool[21419]: ERROR silos.ripple.fcla.edu: 405 Method Not Allowed - DELETEs are not currently allowed on silo 006 - you must enable them to delete E9I5FTQ01_XASQA4.000. 128.227.228.33 - - "DELETE /006/data/E9I5FTQ01_XASQA4.000 HTTP/1.1" Feb 1 09:26:49 ripple SiloPool[21419]: INFO silos.ripple.fcla.edu: Rack: 128.227.228.33 - - [01/Feb/2012 09:26:49] "DELETE /006/data/E9I5FTQ01_XASQA4.000 " 405 126 0.0116
I did see the other error messages:
[lydiam@ripple daemons]$ sudo -u daitss orphan-remover --username lydia2 --password lydia2 --url http://silos.ripple.fcla.edu:70/006/data/E9I5FTQ01_XASQA4 --note "testing new error message" 2012-02-01 09:24:04 INFO Executing: curl -sv -X DELETE http://silos.ripple.fcla.edu:70/006/data/E9I5FTQ01_XASQA4 2>&1 2012-02-01 09:24:04 ERROR Resource at http://silos.ripple.fcla.edu:70/006/data/E9I5FTQ01_XASQA4 not found, 404 returned when attempting to delete
[lydiam@ripple daemons]$ sudo -u daitss orphan-remover --username lydia2 --password lydia2 --url http://silos.ripple.fcla.edu:70/006/data/E9I5FTQ01_XASQA4.000/ --note "testing new error message" [sudo] password for lydiam: 2012-02-01 09:23:52 ERROR Skipping http://silos.ripple.fcla.edu:70/006/data/E9I5FTQ01_XASQA4.000/: It appears to be in the DAITSS copy table as: http://storage-master.ripple.fcla.edu:70/packages/E9I5FTQ01_XASQA4.000
Is there something else that I need to do?
Lydia
On 2/1/2012 9:03 AM, Carol Chou wrote:
A possible enhancement: when orphan-remover is run against a silo that doesn't have "delete" method turned on it doesn't report lack of success, it simply notes that it's issuing a curl command. It might be best if orphan-remover were to issue a "WARN" or "ERROR" line each time it attempts to remove a package but fails for any reason. The issue was because the orphan-remover did not catch the '405 Method not allow", I have changed the orpahn-remover to catch this error.
Also, all the 4xx return codes (such as 'not found', 'gone', and 'method not allowed') changed to log as an error.
The new enhanced code has been released to ripple, please test again.
Reply to this email directly or view it on GitHub: https://github.com/daitss/core/issues/627#issuecomment-3759181
Lydia Motyka Manager, Florida Digital Archive (352)392-9020 x328
Turn out there was an extra space character while parsing the 405, my mistake.
I put in the fix on ripple, please test again :)
Perfect! I think it's ready to go into production.
Lydia
On 2/1/2012 9:56 AM, Carol Chou wrote:
Turn out there was an extra space character while parsing the 405, my mistake.
I put in the fix on ripple, please test again :)
Reply to this email directly or view it on GitHub: https://github.com/daitss/core/issues/627#issuecomment-3759933
The production is able to successfully remove D1 orphan now. Issue resolved.
See https://github.com/daitss/store-master/issues/7 for history.