andersen-lab / bjorn_utils

Key utils for sequence processing moved to a new home from bjorn
GNU General Public License v3.0
0 stars 0 forks source link

Automate upload to GISAID, Github, Google Drive, SD County #5

Closed kramesh1 closed 1 year ago

kramesh1 commented 3 years ago

Need to automate the upload of files to these different sources

GISAID

Will review each of these portions with Mark and Karthik G. before taking out branch against them.

kramesh1 commented 3 years ago

Edit: Separating out the upload pipe for NCBI into its own issue since that infrastructure has not been built at all vs. the locations listed here all have existing pipelines that could just use increased streamlining

kramesh1 commented 3 years ago

Specific todo list here of upload features

kramesh1 commented 2 years ago

Laying out the structure here to be built in a branch:

From any location, a user initiates a bjorn_utils command to "upload". This upload command does 3 things:

  1. Copies the sequences, bam files, and metadata to /alfheim/hcov-19-genomics/
  2. Uploads the data to gisaid from alfheim using the following located in /code, with the save out locations being in hcov-19-genomics as well
./gisaid_uploader CoV upload --fasta /asgard/2021-10-25_release/msa/consensus_sequences/2021-10-25_release_unaligned_combined.fa --csv /asgard/2021-10-25_release/gisaid_metadata.csv --failedout /asgard/2021-10-25_release/gisaid_failed_metadata.csv
  1. Uploads the bam files and sequences to google drive via gsutils
  2. Transfers the sequences and relevant metadata to the HCoV-19-Genomics repo, updates the readme, and uploads them