Joseph-Dixon1 / qbb2023-answers

0 stars 0 forks source link

Week 9 Feedback #10

Open dtaylo95 opened 8 months ago

dtaylo95 commented 8 months ago

Python script to run DE analysis

4.5/6

Exercise Points Possible Grade
Implementation of manul DE test 1 1
Manual DE test on all genes 1 1
Code to do FDR correction 1 0
Running PyDESeq2 on all genes 1 1
Overlap between methods 1 0.5
Code to produce volcano plot 1 1

Missing your code to do FDR correction for the homemade approach. I see the corresponding line in the code, but you need to actually run that function on the pval column of your data_store dataframe, and then store the results in a new column of your dataframe. That's how you'll pick the significant genes.

Your calculation of Jaccard index is a bit funky:

  1. It looks like you're comparing your homemade results to the full list of genes in gtex_whole_blood_counts_formatted.txt. What you want to be comparing is the homemade results to the DESeq2 results (i.e. the sig genes in each approach)
  2. Assuming you make that switch, when you're calculating the Jaccard index, it looks like you're calculating it as n_shared / (n_from_homemade + n_from_deseq2). This is close but the way you have the denominator, you're double counting any genes that are shared between the two approaches. You want to calculate n_shared / (n_total). You might want to check out the .union() function.

README.md with answers to questions

1/1

Exercise Points Possible Grade
Jaccard index overlap between methods 1 1

Despite the errors above, I think you still wouldn't get a Jaccard index of 0... not sure what happened

Output text files

0/2

Exercise Points Possible Grade
List of DE genes in manual test 1 0
List of DE genes in PyDESeq2 test 1 0

Please upload the list of significant DE genes from both analyses

Pretty plots

1/1

Exercise Points Possible Grade
Exercise 2 Volcano plot 1 1

Grade

Total: 6.5/10

This is very close to being complete. Here's what we still need:

  1. Calculate FDR-corrected p-values for your home-made DE approach
  2. Use these values to identify significant DE genes in the homemade approach (padj < 0.1)
  3. Output the list of DE genes from the homemade approach to a file uploaded with the assignment
  4. Output the list of DE genes from the DESeq2 approach to a file uploaded with the assignment
  5. Properly compare the lists of DE genes from both approaches using the Jaccard index
Joseph-Dixon1 commented 8 months ago

Re-uploaded, I tried to output the file for exercise with DeSeq2, you'll see it in my code, but it didn't work, so I'm taking an L. Thanks!

dtaylo95 commented 8 months ago

Python script to run DE analysis

6/6

Exercise Points Possible Grade
Implementation of manual DE test 1 1
Manual DE test on all genes 1 1
Code to do FDR correction 1 1
Running PyDESeq2 on all genes 1 1
Overlap between methods 1 1
Code to produce volcano plot 1 1

You're getting a slightly different answer from me for the Jaccard index. Part of this is I think because of how you're defining the "significant" genes from DESeq2. You're defining them as genes with padj <0.1 and abs(log2FC > 1), which is how we wanted you to color points for the volcano plot, but for the Jaccard index, we were just expecting you to use the FDR cutoff. Interestingly, when I run your code, I get 77%, not 97%, so not sure what happened there. But none of this is a big deal (i.e. no points off or anything). Just wanted to let you know why your answer might not be the same as other peoples.

README.md with answers to questions

1/1

Exercise Points Possible Grade
Jaccard index overlap between methods 1 1

Output text files

1.75/2

Exercise Points Possible Grade
List of DE genes in manual test 1 1
List of DE genes in PyDESeq2 test 1 0.75

I do see in your code where you tried to do this, but honestly, I think you can essentially just copy the code you used to write the DE genes from the homemade analysis: https://github.com/Joseph-Dixon1/qbb2023-answers/blob/1152d9bd7c0c4d35c28230df9cbd1bf08ba53d6d/Week9Whatever/homework_script.py#L62-L68

Pretty plots

1/1

Exercise Points Possible Grade
Exercise 2 Volcano plot 1 1

Grade

Total: 9.75/10

Really great work, Joe. You're welcome to try to get that output list from DESeq2 working and resubmit again for the full 10/10, but no pressure.