Closed cansavvy closed 3 months ago
Thank you for doing all of the simplifications! Given there are so many more steps in the original than this, what's the best way to know if this simplified product recapitulates the main steps of the original analysis? I think I found the RDS file that was saved from the original analysis and can try to compare to that, but wanted to check if you have anything else in mind?
And when I tried to run the code as specified in the header (but without the filter function), I got this error
Error in `dplyr::left_join()`:
! Join columns in `x` must be present in
the data.
✖ Problem with `gene1_symbol`.
---
Backtrace:
▆
1. ├─gimap_dataset %>% gimap_annotate()
2. ├─global gimap_annotate(.)
3. │ └─... %>% ...
4. ├─dplyr::left_join(...)
5. ├─dplyr::left_join(., depmap_cn, by = c(gene1_symbol = "genes"))
6. └─dplyr:::left_join.data.frame(., depmap_cn, by = c(gene1_symbol = "genes"))
Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
And when I tried to run the code as specified in the header (but without the filter function), I got this error
Error in `dplyr::left_join()`: ! Join columns in `x` must be present in the data. ✖ Problem with `gene1_symbol`. --- Backtrace: ▆ 1. ├─gimap_dataset %>% gimap_annotate() 2. ├─global gimap_annotate(.) 3. │ └─... %>% ... 4. ├─dplyr::left_join(...) 5. ├─dplyr::left_join(., depmap_cn, by = c(gene1_symbol = "genes")) 6. └─dplyr:::left_join.data.frame(., depmap_cn, by = c(gene1_symbol = "genes")) Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
Ok, this is because the annotation_df
is empty for me. Even though I'm using the updated get_example_data()
function and have the text file it's loading (and it's non-empty). So I'll look into this
And when I tried to run the code as specified in the header (but without the filter function), I got this error
Error in `dplyr::left_join()`: ! Join columns in `x` must be present in the data. ✖ Problem with `gene1_symbol`. --- Backtrace: ▆ 1. ├─gimap_dataset %>% gimap_annotate() 2. ├─global gimap_annotate(.) 3. │ └─... %>% ... 4. ├─dplyr::left_join(...) 5. ├─dplyr::left_join(., depmap_cn, by = c(gene1_symbol = "genes")) 6. └─dplyr:::left_join.data.frame(., depmap_cn, by = c(gene1_symbol = "genes")) Run rlang::last_trace(drop = FALSE) to see 4 hidden frames.
Ok, this is because the
annotation_df
is empty for me. Even though I'm using the updatedget_example_data()
function and have the text file it's loading (and it's non-empty). So I'll look into this
Once I manually loaded the annotation_df
and reran the rest of the steps piecemeal, it worked
Just from looking at the tables, I'm observing slight mismatches in the TPM and CN values. I'll try to make some scatter plots to try to get a better global view of this.
Just from looking at the tables, I'm observing slight mismatches in the TPM and CN values. I'll try to make some scatter plots to try to get a better global view of this.
Also compared which pgRNAs had missing values for the TPM and CN annotations
Which comparison | gimap # NA | GI Mapping # NA | # overlap based on pgRNA ID (using intersect ) |
---|---|---|---|
Gene1 log2 tpm | 906 | 8769 | 625 |
Gene 2 log2 tpm | 874 | 750 | 532 |
Gene 1 log2 CN | 906 | 8697 | 601 |
Gene 2 log2 CN | 874 | 564 | 500 |
Hope this was helpful -- Let me know what I can do to help follow-up.
Thank you for doing all of the simplifications! Given there are so many more steps in the original than this, what's the best way to know if this simplified product recapitulates the main steps of the original analysis? I think I found the RDS file that was saved from the original analysis and can try to compare to that, but wanted to check if you have anything else in mind?
I think comparing the results is a good strategy. And included in that investigating the differences to see if we think the differences in results are better, same, or worse. If results are same or "better" then the differences are okay.
Can you give me the exact code you ran locally? For example I'm running the code in the examples so:
devtools::load_all()
gimap_dataset <- get_example_data("gimap") %>%
gimap_filter() %>%
gimap_annotate()
Removing the filter step or adding it in doesn't make a difference at this point because its more or less an empty function at this point.
Just from looking at the tables, I'm observing slight mismatches in the TPM and CN values. I'll try to make some scatter plots to try to get a better global view of this.
This was really great you ran through this comparison! Thanks for doing this!
At this point in time however we're not applying a filter so these results will be affected by that.
However the CN and TPM values are from the public repository DepMap so those values should be the same. Can you share the code you used to make these plots so I can dig into that?
Can you give me the exact code you ran locally? For example I'm running the code in the examples so:
devtools::load_all() gimap_dataset <- get_example_data("gimap") %>% gimap_filter() %>% gimap_annotate()
Removing the filter step or adding it in doesn't make a difference at this point because its more or less an empty function at this point.
I've uploaded an .Rmd
and its rendered .html
in a stacked branch on top of this in PR #32
Just from looking at the tables, I'm observing slight mismatches in the TPM and CN values. I'll try to make some scatter plots to try to get a better global view of this.
This was really great you ran through this comparison! Thanks for doing this!
At this point in time however we're not applying a filter so these results will be affected by that.
However the CN and TPM values are from the public repository DepMap so those values should be the same. Can you share the code you used to make these plots so I can dig into that?
Also in the .Rmd
and its rendered .html
in a stacked branch on top of this in PR #32
Also compared which pgRNAs had missing values for the TPM and CN annotations
Which comparison gimap # NA GI Mapping # NA # overlap based on pgRNA ID (using
intersect
) Gene1 log2 tpm 906 8769 625 Gene 2 log2 tpm 874 750 532 Gene 1 log2 CN 906 8697 601 Gene 2 log2 CN 874 564 500
These differences are almost certainly because we are using a more recent release of DepMap data than the original code did. The R package used by the original code was downloading 19Q3 whereas now we download DepMap 22Q2 from https://figshare.com/articles/dataset/DepMap_22Q2_Public/19700056
We should probably double check that there are no differences between these datasets that would make us for sure want to return to 19Q3.
Going to merge this and we can continue to iterate this in a new PR.