StanfordBioinformatics / pulsar_lims

A LIMS for ENCODE submitting labs.
3 stars 1 forks source link

Pulsar Data Management: transfer data from Nathan's bucket to SU #917

Closed twang15 closed 2 years ago

twang15 commented 2 years ago

We have three buckets already created on AWS:

            [pulsar-encode-assets-main](https://s3.console.aws.amazon.com/s3/buckets/pulsar-encode-assets-main?region=us-west-1)
            [pulsar-encode-assets-staging](https://s3.console.aws.amazon.com/s3/buckets/pulsar-encode-assets-staging?region=us-west-1)
            [pulsar-encode-assets-test](https://s3.console.aws.amazon.com/s3/buckets/pulsar-encode-assets-test?region=us-west-1)

(Nathan and I gave this a try before)

Tao is a member of the AWS account that controls these buckets: gbsc-aws-prj-encode. He can go to:

            http://aws-console-idg.stanford.edu/

and get to the console for that account.

twang15 commented 2 years ago

Good to hear from y’all! Tao - I’d use the buckets that Keith pointed out. The key here is to connect Pulsar to them with the IAM user. There are instructions on the Pulsar wiki so give that a try and see how far you can get, then I can join in on the fun too over Zoom if you get stuck : >)

twang15 commented 2 years ago

login via: http://aws-console-idg.stanford.edu/

and then go to the link for all buckets: https://s3.console.aws.amazon.com/s3/home?region=us-east-1

twang15 commented 2 years ago

Steps:

  1. Find out where all the data are stored previously
  2. Copy them over to our new Buckets
  3. Set up Pulsar server properly to access the new buckets
twang15 commented 2 years ago

In principle, it should not affect Elasticsearch. I could not think of a way of this.

The next step should be copy the data in your bucket over to our new bucket. Then the image paths should be patched to reflect this change.

twang15 commented 2 years ago

from pulsar-encode-assets to pulsar-encode-assets-main from staging-pulsar-encode-assets to pulsar-encode-assets-staging from test-pulsar-encode-assets to pulsar-encode-assets-test

twang15 commented 2 years ago

heroku config

replace AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with new keys sent by Keith replace S3_BUCKET value with new bucket name: pulsar-encode-assets-main

config/initializers/aws.rb

replace

region: 'us-east-1',

with

region: 'us-west-2',

, then do git push prod-heroku master for new release

create bucket on AWS

pulsar-encode-assets-main pulsar-encode-assets-staging pulsar-encode-assets-test

Note: when creating the bucket, ACL needs to be Enabled, "Block public access" should be turned off.

open a new tab to access pulsar

twang15 commented 2 years ago

Hi Nathan,

There is an error message when I tried to transfer the data from your bucket to mine:

[taowang9@smsh11dsu-srcf-d15-36 ~]$ aws s3 sync s3://staging-pulsar-encode-assets s3://pulsar-encode-assets-staging fatal error: An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied

What is the public access permission on your side?

Best, Tao

twang15 commented 2 years ago

Here is a page from Amazon that we may try

https://aws.amazon.com/premiumsupport/knowledge-center/copy-s3-objects-account/

twang15 commented 2 years ago

Hi Keith and Nathan,

I found a solution by using local storage as a step stone for the file transfers between our old and new buckets.

Now it is time to fix the links in the database. Then, the old buckets should be ready for reclaim.

Best, Tao

twang15 commented 2 years ago

1058 aws s3 sync s3://staging-pulsar-encode-assets /home/taowang9/pulsarpy/pulsarpy/scripts/staging 1070 aws s3 sync s3://pulsar-encode-assets main aws s3 sync s3://test-pulsar-encode-assets ./test

1108 aws s3 sync /home/taowang9/pulsarpy/pulsarpy/scripts/staging s3://pulsar-encode-assets-staging 1109 aws s3 sync /home/taowang9/pulsarpy/pulsarpy/scripts/main s3://pulsar-encode-assets-main 1110 aws s3 sync /home/taowang9/pulsarpy/pulsarpy/scripts/test s3://pulsar-encode-assets-test

twang15 commented 2 years ago

Hi Nathan,

This is latest error message regarding elasticSearch. Do we need any specific reconfigurations?

2022-03-15T23:30:27.179855+00:00 app[web.1]: Completed 500 Internal Server Error in 51ms (ActiveRecord: 5.3ms) 2022-03-15T23:30:27.180315+00:00 app[web.1]: 2022-03-15T23:30:27.180328+00:00 app[web.1]: Elasticsearch::Transport::Transport::Errors::NotFound ([404] {"error":{"root_cause":[{"type":"document_missing_exception","reason":"[gel_image][423]: document missing","index_uuid":"9S71jMvaSJqKIpwArsgDhg","shard":"0","index":"gel_images"}],"type":"document_missing_exception","reason":"[gel_image][423]: document missing","index_uuid":"9S71jMvaSJqKIpwArsgDhg","shard":"0","index":"gel_images"},"status":404}): 2022-03-15T23:30:27.180329+00:00 app[web.1]: app/controllers/api/gel_images_controller.rb:41:in `update'

twang15 commented 2 years ago

grant public access:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html

twang15 commented 2 years ago

tao1@Taos-MacBook-Pro initializers % heroku apps --all === taowang9@stanford.edu Apps tao-ror1

=== Collaborated Apps pulsar-encode encode-project@herokumanager.com staging-pulsar-encode encode-project@herokumanager.com

twang15 commented 2 years ago
gls = GelImage.all()
for gl in gls do
 if not gl.image.nil? and gl.image.include? "staging-pulsar-encode-assets.s3"
  gl.image["staging-pulsar-encode-assets.s3"] = "pulsar-encode-assets-staging.s3"
  gl.save
 end
end
twang15 commented 2 years ago

run ruby script on heroku rails console

https://spin.atomicobject.com/2017/01/18/rails-script-heroku/

cat bucket-rename-main.rb | heroku run rails console --app=staging-pulsar-encode --no-tty

twang15 commented 2 years ago

Ruby silence an exception

https://til.codes/which-is-the-shortest-way-to-silently-ignore-a-ruby-exception/

gls = GelImage.all()

def ignore_exception
   begin
     yield
   rescue Exception
   end
end

#ignore_exception {
suppress(Exception) do
 for gl in gls do
  if not gl.image.nil? and gl.image.include? "pulsar-encode-assets.s3"
   gl.image["pulsar-encode-assets.s3"] = "pulsar-encode-assets-main.s3"
   gl.save
   #File.write("gel_ids.txt", gl.id.to_s()+"\n", mode: "a")
   #puts gl.id
  end
 end
end 
twang15 commented 2 years ago

submission sheet

srs = SequencingRequest.all()

def ignore_exception
   begin
     yield
   rescue Exception
   end
end

#ignore_exception {
suppress(Exception) do
 for sr in srs do
  if not sr.submission_sheet.nil?
   if sr.submission_sheet.include? "staging-pulsar-encode-assets.s3"
     sr.submission_sheet["staging-pulsar-encode-assets.s3"] = "pulsar-encode-assets-staging.s3"
   elsif sr.submission_sheet.include? "pulsar-encode-assets.s3"
     sr.submission_sheet["pulsar-encode-assets.s3"] = "pulsar-encode-assets-main.s3"
   end

   sr.save
   #File.write("gel_ids.txt", gl.id.to_s()+"\n", mode: "a")
   #puts sr.id
  end
 end
end

cat bucket-rename-staging-submission-sheet.rb | heroku run rails console --app=staging-pulsar-encode --no-tty

twang15 commented 2 years ago

sample sheet

srs = SequencingRequest.all()

def ignore_exception
   begin
     yield
   rescue Exception
   end
end

#ignore_exception {
suppress(Exception) do
 for sr in srs do
  if not sr.sample_sheet.nil?
   if sr.sample_sheet.include? "staging-pulsar-encode-assets.s3"
     sr.sample_sheet["staging-pulsar-encode-assets.s3"] = "pulsar-encode-assets-staging.s3"
   elsif sr.sample_sheet.include? "pulsar-encode-assets.s3"
     sr.sample_sheet["pulsar-encode-assets.s3"] = "pulsar-encode-assets-main.s3"
   end

   sr.save
   #File.write("gel_ids.txt", gl.id.to_s()+"\n", mode: "a")
   #puts sr.id
  end
 end
end

cat bucket-rename-staging-sample-sheet.rb | heroku run rails console --app=staging-pulsar-encode --no-tty

twang15 commented 2 years ago

Gel Image

gls = GelImage.all()

def ignore_exception
   begin
     yield
   rescue Exception
   end
end

#ignore_exception {
suppress(Exception) do
 for gl in gls do
  if not gl.image.nil?
   if gl.image.include? "staging-pulsar-encode-assets.s3"
    gl.image["staging-pulsar-encode-assets.s3"] = "pulsar-encode-assets-staging.s3"
   elsif gl.image.include? "test-pulsar-encode-assets.s3"
    gl.image["test-pulsar-encode-assets.s3"] = "pulsar-encode-assets-test.s3"
   elsif gl.image.include? "pulsar-encode-assets.s3"
    gl.image["pulsar-encode-assets.s3"] = "pulsar-encode-assets-main.s3"
   end

   gl.save
   #File.write("gel_ids.txt", gl.id.to_s()+"\n", mode: "a")
   #puts gl.id
  end
 end
end

cat bucket-rename-gel-images.rb | heroku run rails console --app=staging-pulsar-encode --no-tty

twang15 commented 2 years ago

Update production server

cd /Users/tao1/Documents/2020Spring/Job/RailApps/encode/staging-upgrade/staging-pulsar-encode

# copy batch update code to production site
cp bucket-rename-gel-images.rb /Users/tao1/Documents/2020Spring/Job/RailApps/encode/pulsar_lims
cp  bucket-rename-staging-sample-sheet.rb /Users/tao1/Documents/2020Spring/Job/RailApps/encode/pulsar_lims
cp bucket-rename-staging-submission-sheet.rb /Users/tao1/Documents/2020Spring/Job/RailApps/encode/pulsar_lims

# update
cat bucket-rename-staging-submission-sheet.rb | heroku run rails console --app=pulsar-encode --no-tty
cat bucket-rename-staging-sample-sheet.rb | heroku run rails console --app=pulsar-encode --no-tty
cat bucket-rename-gel-images.rb | heroku run rails console --app=pulsar-encode --no-tty
twang15 commented 2 years ago

Production and testing site works properly.