Construct dictionary of paired gene-tissue data from gene-tissue file
YES
Q1B
Comments
YES
Q2A
Construct dictionary of tissues and corresponding sample IDs from sample attribute file
YES
Q2B
Comments
YES
Q3A
Retrieve sample IDs from expression data file
YES
Q3B
Comments
YES
Q4A
Subset tissue-associated sample IDs to include only those present in expression data
YES
Q4B
Comments
YES
Q5A
Identified tissue types with the most samples
YES
Q5B
Comments
YES
Q5C
Identified tissue types with the fewest samples
YES
Q5D
Comments
YES
Q6
Detailed comments on Q1-Q5 (see "Comments" components for each question above)
YES
Q7A
Combine tissue names and gene IDs
YES
Q7B
Log-transform expression data with a pseudocount of 1
YES
Q7C
Switch violin plot axes so categories are on y-axis
YES
Q7D
Axis labels
YES
Q7E
Discussion of results
YES
Q5B, Q5D: Please provide some comments on your code to explain what your code is doing to identify tissues with the most and fewest samples.
Q7E: It seems you're considering variability of gene expression within tissues, and then comparing those patterns across tissues. That's a fair interpretation of the question. I'd argue that the question intended for you to compare general trends of expression variance across tissues. For example, pancreas genes overall have lower variance, whereas stomach genes have higher variance. You can do the same for other tissues and conclude from there why some tissues would have more variable gene expression than others. However, I see that the question was a bit unclear; I'll mark that question as complete and just copy-paste this note into the grading rubric for reference.
==========
DAY 4 AFTERNOON
Exercise
Description
Completion
Q1.1
Load DNM data
YES
Q1.2
Tabulate paternally and maternally inherited DNMs per proband
YES
Q1.3
Load age data into df
YES
Q1.4
Combine age and DNM data
YES
Q2.1.1
Plot maternal DNMs vs. maternal age
YES
Q2.1.2
Plot paternal DNMs vs. paternal age
YES
Q2.2.0
Fit linear regression model to data
YES
Q2.2.1
Discuss "size" of relationship
YES
Q2.2.2
Discuss significance of relationship
YES
Q2.3.0
Fit linear regression model to data
YES
Q2.3.1
Discuss "size" of relationship
YES
Q2.3.2
Discuss significance of relationship
YES
Q2.4
Predict # of paternal DNMs for proband with 50.5 yr old father at time of birth
YES
Q2.5.1
Histogram of maternal DNMs per proband
YES
Q2.5.2
Histogram of paternal DNMs per proband
YES
Q2.6.0
Test significance between # of maternally vs. paternally inherited DNMs per proband
[Updated 20240911]
==========
DAY 4 LUNCH
==========
DAY 4 AFTERNOON