cat-e-r / qbb2023-answers

0 stars 0 forks source link

Week 6 Feedback #6

Open dtaylo95 opened 9 months ago

dtaylo95 commented 9 months ago

README.md with commands and analyses

2/2

Exercise Points Possible Grade
Commands for Step 1.1 0.33 0.33
Commands for Step 2.1 0.33 0.33
Commands for Step 3.1 0.33 0.33
Answer to Step 3.4 1 1

plotting.py script to produce plots

3.5/4

Exercise Points Possible Grade
Code to produce step 1.2 PC plot 1 1
Code to produce step 2.2 AFS plot 1 1
Code to produce step 3.2 Manhattan plots 1 0.75
Code to produce step 3.3 effect size boxplot 1 0.75

Very minor issue, but it looks like you're plotting ALL of your associations in your manhattan plots, rather than just the genotype associations. To clarify: when you run your GWAS, you include the top PCs as covariates in the regression (this is correct). But this means that you also get regression results for the covariates, not just the variants you're testing. Take a look at the TEST column in the .assoc.linear output file(s) of the plink --linear command to figure out which results you want to keep/plot.

Also, your boxplot doesn't look quite right. Because you're plotting the strongest association, it doesn't make sense that there wouldn't be a clear trend in the box plot. I believe what's going on is that when you read in the phenotype values from GS451_IC50.txt, you're shifting the column indices by 7 to match the genotype indices in the VCF file.

https://github.com/cat-e-r/qbb2023-answers/blob/ab7ad1210cf88ef5e3e5f3935b892d50387824c5/week6-homework/plotting.py#L87-L93

BUT, the VCF has 9 fields before the genotypes start, not 7, so I think you need to shift by 9.

Pretty plots

4/4

Exercise Points Possible Grade
Step 1.2 PC plot 1 1
Step 2.2 AFS plot 1 1
Step 3.2 Manhattan plots 1 1
Step 3.3 effect size boxplot 1 1

Grade

Total: 9.5/10

Nice work!

dtaylo95 commented 9 months ago

Changes look great!

New grade: 10/10

Feel free to close this issue