d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Update BS_IDs in methly matrices #524

Closed ewafula closed 1 year ago

ewafula commented 1 year ago

What data file(s) does this issue pertain to?

methyl-beta-values.rds methyl-cn-values.rds methyl-m-values.rds

What release are you using?

OPC v12 data release on s3 bucket

Put your question or report your issue here.

@zoomzoom1011, just a friendly reminder to update because I recall you worked on creating the methyl matrices. All methylation matrices have some samples that have IDAT array file names as samples instead of registered BS_IDs, which correspond to the count number of BS_Ds in the histologies base missing in the matrices.

Here is the QC report list for each methyl matrix: methyl-beta-values-samples-missing-in-histologies.tsv methyl-cn-values-samples-missing-in-histologies.tsv methyl-m-values-samples-missing-in-histologies.tsv

Cc @zhangb1, @chinwallaa

zhangb1 commented 1 year ago

Thanks to @zoomzoom1011 , the methylation files are updated in the s3 bucket with the BSID naming

@ewafula , can you go and run the QC ? Thanks!

7a01f4945f47cc225aa27feb2446e7e1  methyl-beta-values.rds
f6757946fd9221ae353c680b308c0be5  methyl-cn-values.rds
2f4b2e319bc43ddf8b311fd1312c7795  methyl-m-values.rds
ewafula commented 1 year ago

@zoomzoom1011, @zhangb1, thank you for updating the methyl matrices. Most samples in the matrices that had not been assigned BS_ID are now assigned. But there are 11 duplicate samples in the matrices. This is likely that either you preprocessed some arrays for samples from the previous v11 batch or the v12 batch has arrays for samples already in the v11 batch. The 11 BS_IDs already in v11 matrices need to be dropped from the v12 preprocessed batch before merging.

See QC results here: methyl-beta-values-samples-missing-in-histologies.tsv methyl-cn-values-samples-missing-in-histologies.tsv methyl-m-values-samples-missing-in-histologies.tsv

@zoomzoom1011, this issue was previously discussed in ticket https://github.com/PediatricOpenTargets/ticket-tracker/issues/430#issuecomment-1405149566, and you need to follow the same process you did to exclude them.

zhangb1 commented 1 year ago
b6c598e07118c7bd79f634519106fd97  methyl-beta-values.rds
01d29ea521300a5a92f964a28f672b65  methyl-cn-values.rds
038268f53c81bde1ac49c75432938e5b  methyl-m-values.rds

@ewafula methyl files has been updated. chuwei removed the duplicated samples ..

ewafula commented 1 year ago
b6c598e07118c7bd79f634519106fd97  methyl-beta-values.rds
01d29ea521300a5a92f964a28f672b65  methyl-cn-values.rds
038268f53c81bde1ac49c75432938e5b  methyl-m-values.rds

@ewafula methyl files has been updated. chuwei removed the duplicated samples ..

This now looks good. Thank you, @zoomzoom1011 and @zhangb1!