buda-base / drs-deposit

Harvard DRS Deposit base
1 stars 0 forks source link

Pull together script to retry batchbuilds #64

Closed jimk-bdrc closed 6 years ago

jimk-bdrc commented 6 years ago

Sometimes batch builds fail because of a resource error. Rather than rebuild and recopy, you can just retry after the copies. Find them first

find $PR/20180717 -name errorSummary.txt -maxdepth 3 -not -size 0 -ls > ~/tmp/20180717errors/lst

awk -F' ' '{ print $NF }' ~/tmp/20180717errors/lst | sed 's/-logs.*//'`

and you get list of build paths: /Volumes/DRS_Staging/DRS/prod/20180716/buildList1.txt.23.33/batchW00EGS1015752-1n /Volumes/DRS_Staging/DRS/prod/20180716/buildList1.txt.23.33/batchW00KG0624-1n /Volumes/DRS_Staging/DRS/prod/20180716/buildList2.txt.23.33/batchW1KG14500-1n /Volumes/DRS_Staging/DRS/prod/20180716/buildList2.txt.23.33/batchW1KG14783-1n Now look for error summaries which do not have sequence errors ( the most common kind)

while read ee ; do grep -i seQUEnce $ee-logs/errorSummary.txt; (($? == 1 )) && { echo $ee >> fixable.lst;} ; done < lst Now fixable.lst has only the batches we can rebuild in place.

Run fix-one-batch against it.

jimk-bdrc commented 6 years ago

This won't work. problem is you can't restart an arbitrary batch, because project.conf has been written over between batches. Log it as a fail and try again.