Open dtaylo95 opened 8 months ago
Re-uploaded, I tried to output the file for exercise with DeSeq2, you'll see it in my code, but it didn't work, so I'm taking an L. Thanks!
6/6
Exercise | Points Possible | Grade |
---|---|---|
Implementation of manual DE test | 1 | 1 |
Manual DE test on all genes | 1 | 1 |
Code to do FDR correction | 1 | 1 |
Running PyDESeq2 on all genes | 1 | 1 |
Overlap between methods | 1 | 1 |
Code to produce volcano plot | 1 | 1 |
You're getting a slightly different answer from me for the Jaccard index. Part of this is I think because of how you're defining the "significant" genes from DESeq2. You're defining them as genes with padj <0.1
and abs(log2FC > 1)
, which is how we wanted you to color points for the volcano plot, but for the Jaccard index, we were just expecting you to use the FDR cutoff. Interestingly, when I run your code, I get 77%, not 97%, so not sure what happened there. But none of this is a big deal (i.e. no points off or anything). Just wanted to let you know why your answer might not be the same as other peoples.
README.md
with answers to questions1/1
Exercise | Points Possible | Grade |
---|---|---|
Jaccard index overlap between methods | 1 | 1 |
1.75/2
Exercise | Points Possible | Grade |
---|---|---|
List of DE genes in manual test | 1 | 1 |
List of DE genes in PyDESeq2 test | 1 | 0.75 |
I do see in your code where you tried to do this, but honestly, I think you can essentially just copy the code you used to write the DE genes from the homemade analysis: https://github.com/Joseph-Dixon1/qbb2023-answers/blob/1152d9bd7c0c4d35c28230df9cbd1bf08ba53d6d/Week9Whatever/homework_script.py#L62-L68
1/1
Exercise | Points Possible | Grade |
---|---|---|
Exercise 2 Volcano plot | 1 | 1 |
Total: 9.75/10
Really great work, Joe. You're welcome to try to get that output list from DESeq2 working and resubmit again for the full 10/10, but no pressure.
Python script to run DE analysis
4.5/6
Missing your code to do FDR correction for the homemade approach. I see the corresponding line in the code, but you need to actually run that function on the
pval
column of yourdata_store
dataframe, and then store the results in a new column of your dataframe. That's how you'll pick the significant genes.Your calculation of Jaccard index is a bit funky:
gtex_whole_blood_counts_formatted.txt
. What you want to be comparing is the homemade results to the DESeq2 results (i.e. the sig genes in each approach)n_shared / (n_from_homemade + n_from_deseq2)
. This is close but the way you have the denominator, you're double counting any genes that are shared between the two approaches. You want to calculaten_shared / (n_total)
. You might want to check out the.union()
function.README.md
with answers to questions1/1
Despite the errors above, I think you still wouldn't get a Jaccard index of 0... not sure what happened
Output text files
0/2
Please upload the list of significant DE genes from both analyses
Pretty plots
1/1
Grade
Total: 6.5/10
This is very close to being complete. Here's what we still need:
padj < 0.1
)