PhyloGrok / VCFgenerator

Automated variant calling app for NextGen evolutionary genomics
GNU General Public License v3.0
0 stars 0 forks source link

Get SRA metadata along with the reference and SRA run downloads #19

Open PhyloGrok opened 1 year ago

PhyloGrok commented 1 year ago

Currently the SraRunTable.txt is coming from the "Metadata" tab in the NCBI SRA Run Table web GUI. We'll need to find a way to automatically download this table.

PhyloGrok commented 1 year ago

Actually the esearch can download an SraRunTable.txt equivalent by fetching the "runinfo" format.
https://www.biostars.org/p/9545261/

esearch -db sra -query {BioprojectID} | efetch -format runinfo > RunInfo.txt

Now we need to modify the step1 part of the workflow to download this file concurrently when downloading the SRA data, so that it will serve as the source of metadata for the BioProject and associated BioSample info, when it goes into the SQLlite database.

PhyloGrok commented 1 year ago

We solved this set as part of Nhi's Annotation workflow, the metadata from RunInfo.txt gets sent to the SQLite database.

nluu1 commented 1 year ago

Issue re-opened: 07/28/2023

We solved this set as part of Nhi's Annotation workflow, the metadata from RunInfo.txt gets sent to the SQLite database.

I have an R script that import the information in the RunInfo.txt into the SQLite database but have not incorporated this code snippet:

esearch -db sra -query {BioprojectID} | efetch -format runinfo > RunInfo.txt

into the workflow yet. I believe this code snipper can be incorporated into Lloyd's current workflow while downloading files for the reference genome. Can you or @LloydJonesIII incorporate the code snippet into the current reference retrieval workflow? where {BioprojectID} is the placeholder for the BioProjectID.

LloydJonesIII commented 1 year ago

08-01-23

../../media/volume/sdb/$1/assembly/reference/RunInfo.txt code has been added to the workflow this is the pathing to the output file within the code workflow

nluu1 commented 1 year ago

Thanks, I saw the file. I'll incorporate the script to put the RunInfo.txt into SQLite Database shortly.