FDA-ARGOS / data.argosdb

MIT License
3 stars 7 forks source link

Mazumder - Generate an addition 10 graphs in Observable #134

Closed steph-sing closed 1 year ago

steph-sing commented 1 year ago

This work should include write-ups on results and initial conclusions on the standards a given data must meet to be considered 'regulatory-grade'. This work will also help build use cases for DB view + prep for paper.

JingyueWu commented 1 year ago

Status update: I have created a fork of ARGOS NGS QC data exploration notebook and am currently working on recreating the ngsQC graph.

steph-sing commented 1 year ago

(steph) Please see my comments below per your suggestions - thanks and nice work. Please update new slides in tandem - but do not overwrite Joe's work. Please duplicate the slides in this QC Visualization Folder, add your initials and the version at the end (i.e. _jw2) for the slide deck name, and then update that slide deck. Please also update any applicable figure legends throughout the slide deck. Please put a link to your new slides in the corresponding GitHub ticket.

Graph 1 -- In phylogenetic tree, add new organisms and their respective strains (i.e. E.coli, Sudan ebolavirus, and Yersinia) Table/data used: ngsQC Organism Tracking and ngsID List, Selection Criteria, and Master Table

Graph 2 -- A series of graphs of average GC content from ngsQC protocol Old version: Salmonella enterica, MERS, Influenza A virus, SARS-CoV-2, Marburg marburgvirus, and Monkeypox Purposed new version: all of the above, plus 1) Yersinia, 2) E.coli, and 3) Sudan ebolavirus Table/data used: the most recently updated ngsQC_HL (*note: Joe is currently QC'ing Yersinia, and will update in bulk once done)

Graph 3 -- Average quality score per T, G, C, and A for different organisms Old version: Average T, G, C, A quality for just Salmonella enterica samples by PacBio RS and PacBio SMRT sequencing platforms Proposed new version: Generate average T, G, C, A quality for more organisms

Graph 4 -- Dot plot showing average phred score vs. average read length Old version: Average phred score vs. average read length for Salmonella enterica and SARS-CoV-2 Proposed new version: Generate average phred score vs. average read length graphs for more organisms

Section 2: New Graphs Proposed by Jingyue

(AssemblyQC) Not Approved, Comments below in blue: X Graph 5 -- Create a bar graph showing the relationships between N50, N75, and N90 with different organisms Table/data used: the most recently updated assemblyQC_HL X Graph 6 -- Create a bar graph showing the relationships between L50 and L75 with different organisms Table/data used: the most recently updated assemblyQC_HL

Graph 7 -- Create a Principal Component Analysis (PCA) plot showing the relationships between % unaligned and different organisms, using different tools (e.g. illumina vs. PacBio) Table/data used: the most recently updated assemblyQC_HL

Lets come back to these after you complete the NGS and Assembly Graphs: Graph 8 -- Create a Manhattan plot showing single nucleotide polymorphisms between different biological samples at specific sites of interest. Table/data used: the most recently updated siteQC_HL Graph 9 -- Create a dot plot showing the relationship between mutant variants at targeted position and their corresponding clinical significance

steph-sing commented 1 year ago

@JingyueWu can you provide a status update and I will continue this ticket in Jan 2023

steph-sing commented 1 year ago

@Jgergely11 please work on updating/generating the Observable graphs based on our current data, and the information I've provided above. Please update/add to your slides and send to me for review by the end of the month. thanks

Jgergely11 commented 1 year ago

Unlikely to complete 10 new graphs, but will provide at least 2-3.

steph-sing commented 1 year ago

@Jgergely11 The new graphs are outlined for Assembly QC above + the updates suggested. Those can all count as new graphs, just FYI

Jgergely11 commented 1 year ago

https://observablehq.com/collection/@mazumderlab/january-2023-argos-figures

I was unable to complete 10 graphs, but I have added 3 to the link provided above. I started working on the Assembly QC related graphs but those will have to be pushed to February.

The graphs provided are still drafts, but the overall content won't change - I'll just need to make tweaks to labels, axis, etc. There is an option to leave comments within the notebooks for any required feedback.

steph-sing commented 1 year ago

will review tomorrow, 02/02/223

steph-sing commented 1 year ago

@Jgergely11 will use this as a base for the graphs we choose for the QC paper

steph-sing commented 1 year ago

Per meeting on 03/24:

New slides for QC paper (3/24/2023) - (04/07/2023 - Meet again in 2 weeks QC Visualization Folder

steph-sing commented 1 year ago

Items for QC paper were moved to the QC paper ticket. Considering this ticket complete.