CDLUC3 / ezid

CDLUC3 ezid
MIT License
11 stars 4 forks source link

521 refactor batch download use s3 #566

Closed jsjiang closed 4 months ago

jsjiang commented 4 months ago

@rushirajnenuji @sfisher Hi Scott and Rushiraj, Here are the major changes for refactoring batch download to use S3 bucket instead of local disk:

To test on UI (with VPN or on UCOP network):

  1. Logon to http://ezid-dev.cdlib.org/ as 'apitest' or your own account
  2. go to the MANAGE IDs tab,
  3. click on the DOWNLOAD ALL button. EZID will display the file download notification on the screen.
  4. Start the proc-download job on the ezid-dev server if it is not started.
  5. Download the report file using the provided URL Note: replace https with http when running test on the ezid-dev server, for example:

https://ezid-dev.cdlib.org/s3_download/QEr6L4MN3Mvv5eY1.zip => http://ezid-dev.cdlib.org/s3_download/QEr6L4MN3Mvv5eY1.zip

File download notification:

Your download request is being processed and will be available in a few minutes.

When it is ready to download, an email with the download link will been sent to the email address affiliated with your EZID account: ezid@ucop.edu.

After you receive the email, you may also download a .csv file of the requested identifiers using the link below:

https://ezid-dev.cdlib.org/s3_download/QEr6L4MN3Mvv5eY1.zip

The download link will expire in 1 week.

To test using the client batch-download tool (with VPN or on UCOP network):

  1. Download the ezid-client-tools from github
  2. Modify the batch-dwonload.sh script: a. replace ezid-prd with ezid-dev: change line 6 url="https://ezid.cdlib.org/download_request" to url="http://ezid-dev.cdlib.org/download_request"

b. Replace "https" with "http" for the $url parameter before calling the "curl -f -O -s $url" command

url=${url/https/http}   # add this line
echo $url                     # add this line

status=22
while [ $status -eq 22 ]; do
  echo -n "."
  sleep 5
  curl -f -O -s $url
  status=$?
done
  1. run the batch download script with proper parameter, for example:
    ./batch-download-dev.sh apitest apitest format=csv column=_id column=_mapperTitle notify=your-email
  2. start the the proc-download job on the ezid-dev server if it is not started
  3. Download the report file using the provided URL, for example

http://ezid-dev.cdlib.org/s3_download/iQyba5CT17K6EkLk.csv.gz

Please let me know if you have questions.

Thank you

Jing