Open SebastianHollizeck opened 2 years ago
If the default values in the chosen_K
column are NA
, the user needs to spot check and manually fill these in. Are you still getting an error when chosen_K
is fully set?
Yes, i tried both the default of not setting the selected k or just using the minimum BIC column for the selection of the chains and either showed the error
Even though the chosen K was 3
> set_k_choices[47,]
# A tibble: 1 × 5
set_name_bin min_BIC elbow knee chosen_K
<chr> <dbl> <dbl> <dbl> <dbl>
1 001111 3 3 3 3
I supplied the chosen K how it is calculated in the merge of the chains to the selection. It is all the same but set 47 is 2 instead of 3.
best_K_vals <- unname(sapply(best_set_chains, function(x) max(x$z_chain$value)))
> which(best_K_vals!=set_k_choices$chosen_K)
[1] 47
> best_set_chains <- collectBestKChains(all_set_results, chosen_K =best_K_vals)
But when i then merged the set chains, there were multiple warnings
> chains <- mergeSetChains(best_set_chains, input_data)
Warning messages:
1: Problem while computing `Mutation_index = as.numeric(...)`.
ℹ NAs introduced by coercion
2: Problem while computing `s = as.numeric(...)`.
ℹ NAs introduced by coercion
When i plotted the cluster assignment it seemed to look okay (even though it is too big to be plotted sensibly)
So i wanted to get the mutation assignment, but it errored
> writeClusterAssignmentsTable(chains$z_chain)
Error in `mutate()`:
! Problem while computing `Mut_ID = Mut_ID`.
✖ `Mut_ID` must be size 1, not 2692.
ℹ The error occurred in group 1: Parameter = z[1].
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/dplyr:::mutate_error>
Error in `mutate()`:
! Problem while computing `Mut_ID = Mut_ID`.
✖ `Mut_ID` must be size 1, not 2692.
ℹ The error occurred in group 1: Parameter = z[1].
---
Backtrace:
1. pictograph::writeClusterAssignmentsTable(chains$z_chain)
6. dplyr:::mutate.data.frame(., Mut_ID = Mut_ID, Cluster = value)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/dplyr:::mutate_error>
Error in `mutate()`:
! Problem while computing `Mut_ID = Mut_ID`.
✖ `Mut_ID` must be size 1, not 2692.
ℹ The error occurred in group 1: Parameter = z[1].
---
Backtrace:
▆
1. ├─pictograph::writeClusterAssignmentsTable(chains$z_chain)
2. │ └─... %>% arrange(Cluster)
3. ├─dplyr::arrange(., Cluster)
4. ├─dplyr::select(., Mut_ID, Cluster)
5. ├─dplyr::mutate(., Mut_ID = Mut_ID, Cluster = value)
6. ├─dplyr:::mutate.data.frame(., Mut_ID = Mut_ID, Cluster = value)
7. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
8. │ ├─base::withCallingHandlers(...)
9. │ └─mask$eval_all_mutate(quo)
10. ├─dplyr:::dplyr_internal_error(...)
11. │ └─rlang::abort(class = c(class, "dplyr:::internal_error"), dplyr_error_data = data)
12. │ └─rlang:::signal_abort(cnd, .file)
13. │ └─base::signalCondition(cnd)
14. └─dplyr `<fn>`(`<dpl:::__>`)
15. └─rlang::abort(...)
Is this still connected to the original bug, or is this a different issue?
Thanks for bringing this to our attention. The method hasn't been tested on datasets as large as yours. Could you share the input data via email?
While this is human data, this is only a downsampled version of all somatic variants called in these samples, so i can share it here. The original dataset contains about 60K variants, the ones I used as input are protein altering variants where copy number information was available.
The following link contains the download for the RDS which contains the input list I used.
https://cloudstor.aarnet.edu.au/plus/s/efWBYj3wlD3Wanq
This link will only be accessible for a week.
Hi Sebastian, Sorry for the slow response. We've had some personnel turnover on our end and we were finally able to look at your problem. We discovered that in your input file, while the dimensions for y, n, and tcn are 2692 x 6, the dimensions for m are 2735 x 6. To fix the error, you would need to make the dimensions consistent. We have also made adjustments to the code to handle samples with very large number of mutations. If you fix the input and update to the latest version of PICTograph (v1.2.0.1), you should be able to run your job without any problems.
Hi,
i just tried to run PICTOgraph on my dataset with abour 2300 variants in 6 samples and everything went fine until the merging of the chains in the end.
I left everything to default setting when chosing the K for the sets:
I did some digging and it looks like the best_K_vals calculated in the merge which uses the z_chain has value 2 where the w_chain has value 3.
I can obviously set the one set to 2 instead of 3 manually, because this is the BIC plot in question
But I think this qualifies as a bug.
Sadly i dont know how to even adress the issue otherwise.
Thank you for you help, Sebastian