FDA-ARGOS / data.argosdb

MIT License
3 stars 7 forks source link

Mazumder, Crandall, Pond - Planned data release for Feb 22nd (V1.44-1.45) #211

Closed steph-sing closed 1 year ago

steph-sing commented 1 year ago

Push is rescheduled for Wednesday Feb 28th due to changes in datasets.

steph-sing commented 1 year ago

Primary and Secondary Data Release Notes Draft:

Input by: Steph Description: 6 tables were combined to result in 3 larger datasets (BCOs and recipes updated). 1 new fasta record with 2 fasta inside (new BCO and recipe). 5 datasets updated. Location: Server: Dev (Output: /reviewed folder in Dev)

New Tables (Primary Release): ARGOS_000025 - ngsQC_Crandall.tsv absorbed into ARGOS_000019 - ngsQC_HL.tsv. New BCO and File name = ARGOS_000019 - ngsQC_HIVE.tsv assemblyQC-Crandall.tsv absorbed into ARGOS_000012 - assemblyQC_HL.tsv. New BCO and File name = ARGOS_000012 - assemblyQC_HIVE.tsv ARGOS_000028 - biosampleMeta-Crandall.tsv absorbed into ARGOS_000020 - biosampleMeta_HL.tsv. New BCO and File name = ARGOS_000020 - biosampleMeta_HIVE.tsv ARG_000010 - reference-guided_genome_assemblies_HIVE-Hexagon.fasta

Updated Tables (Primary Data Release): ARGOS_000011 - siteQC_HIVE.tsv

Updated Tables (Secondary Data Release): ARGOS_000038 - assemblyQC_NCBI ARGOS_000009 - ngsQC_NCBI ARGOS_000018 - ngs_id_list.tsv ARGOS_000010 - biosampleMeta_ncbi.tsv (previously SRA_biosample.tsv; table was updated but BCO still needs to be updates)

Remaining Tables in /reviewed: ARG_000001 - NC_045512_SARS-CoV-2_Wuhan.fasta ARGOS_000014 - sars-cov-2_lineage_mutations.tsv ARGOS_000015 - property_definition.tsv ARGOS_000016 - core_property_list.tsv ARGOS_000017 - non-core_property_list.tsv

Cards to Archive in the DB (Will Request Space from Robel - still currently in /review - no action): ARGOS_000013 - HIV1_UP000002241_proteome_metadata.csv ARGOS_000029 - UP000180448_33727_DNA.fasta ARGOS_000030 - UP000180448_33727.fasta ARGOS_000031 - uniprot-proteome_UP000180448.csv ARGOS_000032 - UP000001014_99287_DNA.fasta ARGOS_000033 - UP000001014_99287.fasta ARGOS_000034 - uniprot-proteome_UP000001014.csv ARGOS_000001 - UP000009255_211044_DNA.fasta ARGOS_000002 - UP000009255_211044.fasta ARGOS_000003UP000009255_proteome_genome_metadata.csv ARGOS_000004 - UP000464024_2697049_DNA.fasta ARGOS_000005 - UP000464024_2697049.fasta ARGOS_000006 - uniprot-proteome_UP000464024.csv ARGOS_000008 - PRJNA231221_AssemblyUpdated.tsv

All BCOs and recipes passed sanity checks.

steph-sing commented 1 year ago

@rykahsay It seemed like all instances went blank when trying to push V1.43, so we had to roll things back to V1.42. I've attached a screenshot and draft release notes to assist in our troubleshooting, since neither Jonathon or I have experienced this issue before and were unable to resolve it ourselves. Can yo please assist? (Draft Release Notes above - happy to explain any changes)

Image

rykahsay commented 1 year ago

The script "object-maker/make-siteqcdb.py" is failing and I have modified it so that it outputs the right error message (see screenshot below). The problem is that it expects the field "genomic_coordinates_start" and what you have in the siteQC_HIVE.tsv and core_property_list.tsv files is "genomic_coordinate_start". To fix this, you need to do the following changes in the files siteQC_HIVE.tsv and core_property_list.tsv.

From "genomic_coordinate_start" to "genomic_coordinates_start"

image
steph-sing commented 1 year ago

@rykahsay I am not sure why this field keeps being changed back or why it wasn't caught in our pre-release QC scripts for the data dictionary. I will address this, thank you.

Otherwise I am seeing another issue, outlined below: it looks like even with the roll back to V1.42, I am seeing the changes I was intending to make with V1.43. It seems like the previous version of 1.42 was erased and replaced with V1.43? Do you think this will automatically be fixed when we push to V1.43?

Example: in V1.43 I absorbed Crandall's ngsQC table into ours, and changed the name of the file to ngsQC-HIVE (and Crandall's old ngsQC table does not exist in the DB anymore). I see this also reflected in the Release History for V1.41-V1.42 image image

rykahsay commented 1 year ago

Looks like Jonathon created 1.42 yesterday and nobody touched that release after that. Can you focus in pushing 1.43 correctly and not worry about what happened to 1.42?

image
rykahsay commented 1 year ago

@kee007ney --> since you are the one creating the push, can you please try to make 1.43 now? Or you can do a new one (1.44)?

Thanks

kee007ney commented 1 year ago

Done. 1.44 pushed to tst/beta/prd.

HadleyKing commented 1 year ago

@rykahsay and @steph-sing The field changes with genomic_coordinates_start and genomic_coordinate_start was due to me. When I received the data dictionary the last few times it was throwing errors and I changed it. I made the changes to siteQC_HIVE.tsv and core_property_list.tsv based on the data dictionary assuming that that was the curated "source of truth".

I thought that I had reported those changes in a way that would let everyone know what happened in the release documentation.

We will fix that breakdown in communication for this release

steph-sing commented 1 year ago

Second push moved to March Task. Closing this ticket.