Closed jharenza closed 2 years ago
@zhangb1 can you take a look at this please
humm, just check the v11 folder:
s3://d3b-openaccess-us-east-1-prd-pbta/open-targets/v11/snv-dgd.maf.tsv.gz
this file is gizpped. or I miss something?
It looks like it has the .gz extension yes, but when you download, it saves as TSV only. But, what I meant was we need to update the two BS_ids above. Can you check on that please?
^ @HuangXiaoyan0106
@jharenza
I have checked the bix_workflows.dgd_genomics_file_manifest
, these two BS_ids are still in the table. And I didn't see any update ids for these two maf files(ET_6TJ718RG_DGD.vep.maf
,ET_2VMXCM6Y_DGD.vep.maf
). Or did I do the wrong check?
SELECT * FROM bix_workflows.dgd_genomics_file_manifest WHERE biospecimen_id='BS_YWAMZMGF'
SELECT * FROM bix_workflows.dgd_genomics_file_manifest WHERE biospecimen_id='BS_NEW113J5'
OR
SELECT * FROM bix_workflows.dgd_genomics_file_manifest WHERE file_name='ET_6TJ718RG_DGD.vep.maf'
SELECT * FROM bix_workflows.dgd_genomics_file_manifest WHERE file_name='ET_2VMXCM6Y_DGD.vep.maf'
Hmm, @nicholasvk can you check as to why these two are in the file but not in the data warehouse view please?
There are 2 GENIE records that were not mapped at the time of our workflow development efforts with DGD. They have the old external sample ID format C ID + a sequential number vs. the new format where we associated DGD clinical assays to diagnoses captured in the DGD REDCap project. These must not have mapped at the time and we would need to revisit to see if they can be mapped. I think until they are mapped it makes sense to have them excluded from the PBTA / OT workflow. They are already not being included in the histologies file, not sure what the implications of removing them from the maf file would be. Would GENIE researchers be using this?
Ok, no problem- we can remove them from the MAF - @runjin326 can you do this please?
I don't know if GENIE users are using this at all, so fine to exclude. Thanks for looking into this @nicholasvk
@jharenza - updated and uploaded to s3 - also updated md5sum.txt.
Thanks @runjin326 !
What data file(s) does this issue pertain to?
snv-dgd.maf.tsv
What release are you using?
v11
Put your question or report your issue here.