Closed ewafula closed 1 year ago
Thanks to @zoomzoom1011 , the methylation files are updated in the s3 bucket with the BSID naming
@ewafula , can you go and run the QC ? Thanks!
7a01f4945f47cc225aa27feb2446e7e1 methyl-beta-values.rds
f6757946fd9221ae353c680b308c0be5 methyl-cn-values.rds
2f4b2e319bc43ddf8b311fd1312c7795 methyl-m-values.rds
@zoomzoom1011, @zhangb1, thank you for updating the methyl matrices. Most samples in the matrices that had not been assigned BS_ID are now assigned. But there are 11 duplicate samples in the matrices. This is likely that either you preprocessed some arrays for samples from the previous v11 batch or the v12 batch has arrays for samples already in the v11 batch. The 11 BS_IDs already in v11 matrices need to be dropped from the v12 preprocessed batch before merging.
See QC results here: methyl-beta-values-samples-missing-in-histologies.tsv methyl-cn-values-samples-missing-in-histologies.tsv methyl-m-values-samples-missing-in-histologies.tsv
@zoomzoom1011, this issue was previously discussed in ticket https://github.com/PediatricOpenTargets/ticket-tracker/issues/430#issuecomment-1405149566, and you need to follow the same process you did to exclude them.
b6c598e07118c7bd79f634519106fd97 methyl-beta-values.rds
01d29ea521300a5a92f964a28f672b65 methyl-cn-values.rds
038268f53c81bde1ac49c75432938e5b methyl-m-values.rds
@ewafula methyl files has been updated. chuwei removed the duplicated samples ..
b6c598e07118c7bd79f634519106fd97 methyl-beta-values.rds 01d29ea521300a5a92f964a28f672b65 methyl-cn-values.rds 038268f53c81bde1ac49c75432938e5b methyl-m-values.rds
@ewafula methyl files has been updated. chuwei removed the duplicated samples ..
This now looks good. Thank you, @zoomzoom1011 and @zhangb1!
What data file(s) does this issue pertain to?
methyl-beta-values.rds methyl-cn-values.rds methyl-m-values.rds
What release are you using?
OPC v12 data release on s3 bucket
Put your question or report your issue here.
@zoomzoom1011, just a friendly reminder to update because I recall you worked on creating the methyl matrices. All methylation matrices have some samples that have IDAT array file names as samples instead of registered BS_IDs, which correspond to the count number of BS_Ds in the histologies base missing in the matrices.
Here is the QC report list for each methyl matrix: methyl-beta-values-samples-missing-in-histologies.tsv methyl-cn-values-samples-missing-in-histologies.tsv methyl-m-values-samples-missing-in-histologies.tsv
Cc @zhangb1, @chinwallaa