Closed jharenza closed 3 years ago
@jharenza There might be some md5sum mismatches in the v3 release that could be fixed in the v4 release as well.
When downloading the v3 file using the most recent OT/OpenPBTA-analysis download-data.sh
, I got the following md5sum mismatches:
Checking MD5 hashes...
gtex-gene-expression-rsem-tpm-collapsed.polya.rds: FAILED
gtex-histologies.tsv: OK
intersect_cds_lancet_strelka_mutect_WGS.bed: FAILED
intersect_strelka_mutect_WGS.bed: OK
kfnbl-fusion-arriba.tsv.gz: OK
kfnbl-fusion-starfusion.tsv.gz: OK
kfnbl-gene-counts-rsem-expected_count.stranded.rds: FAILED
kfnbl-gene-counts-rsem-expected_count-collapsed.stranded.rds: OK
kfnbl-gene-expression-kallisto.stranded.rds: OK
kfnbl-gene-expression-rsem-fpkm-collapsed.stranded.rds: OK
kfnbl-gene-expression-rsem-fpkm.stranded.rds: FAILED
kfnbl-gene-expression-rsem-tpm-collapsed.stranded.rds: OK
kfnbl-gene-expression-rsem-tpm.stranded.rds: FAILED
kfnbl-histologies.tsv: OK
kfnbl-isoform-counts-rsem-expected_count.stranded.rds: OK
kfnbl-isoform-expression-rsem-tpm.stranded.rds: OK
kfnbl-snv-lancet.vep.maf.gz: OK
kfnbl-snv-mutect2.vep.maf.gz: OK
kfnbl-snv-strelka2.vep.maf.gz: OK
kfnbl-snv-vardict.vep.maf.gz: OK
release-notes.md: OK
target-gene-expression-rsem-tpm-collapsed.rds: FAILED
target-histologies.tsv: OK
tcga-gene-expression-rsem-tpm-collapsed.rds: FAILED
tcga-histologies.tsv: OK
WGS.hg38.lancet.300bp_padded.bed: OK
WGS.hg38.lancet.unpadded.bed: OK
WGS.hg38.mutect2.vardict.unpadded.bed: OK
WGS.hg38.strelka2.unpadded.bed: OK
WGS.hg38.vardict.100bp_padded.bed: OK
md5sum: WARNING: 7 computed checksums did NOT match
For example, the md5sum of the dowloaded target-gene-expression-rsem-tpm-collapsed.rds
is 4ea6cb1a65e5c9698b07b7408529d308
, whereas the md5sum in v3 release md5sum.txt
is 1a2444fde3b488168e0d3958a2d1b937
.
I re-downloaded the v3 release and got the same md5sum mismatches.
readRDS
in the docker RStudio gives the following error:
> readRDS('data/v3/gtex-gene-expression-rsem-tpm-collapsed.polya.rds')
Error in readRDS("data/v3/gtex-gene-expression-rsem-tpm-collapsed.polya.rds") :
unknown input format
@logstar can you delete those files and try to run the download script again? sometimes if they download partially and give a mismatch, you need to delete so they can re-download. let me know!
my release file for target-gene-expression-rsem-tpm-collapsed.rds
matches that in the release md5sum.txt
harenzaj@38f9d38f36c9 v3 % md5sum target-gene-expression-rsem-tpm-collapsed.rds
1a2444fde3b488168e0d3958a2d1b937 target-gene-expression-rsem-tpm-collapsed.rds
@logstar can you delete those files and try to run the download script again? sometimes if they download partially and give a mismatch, you need to delete so they can re-download. let me know!
my release file for
target-gene-expression-rsem-tpm-collapsed.rds
matches that in the releasemd5sum.txt
harenzaj@38f9d38f36c9 v3 % md5sum target-gene-expression-rsem-tpm-collapsed.rds 1a2444fde3b488168e0d3958a2d1b937 target-gene-expression-rsem-tpm-collapsed.rds
I deleted target-gene-expression-rsem-tpm-collapsed.rds
and rerun bash download-data.sh
, but the md5sum still mismatches.
I think the md5sum mismatched files are not available at https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3 . For example, https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/target-gene-expression-rsem-tpm-collapsed.rds links to an error saying the "The specified key does not exist".
The matched files are available. For example, https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/tcga-histologies.tsv links to a file for downloading, and its md5sum matches v3 release md5sum.txt
.
Also adding to this that we need to add count matrices for GTEX, TARGET, and TCGA per #27
Also adding to this that we need to add count matrices for GTEX, TARGET, and TCGA per #27
Thank you for the note.
target-gene-expression-rsem-tpm-collapsed.rds
is available at https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/target-gene-expression-rsem-tpm-collapsed.rds now, and its md5sum matches v3 release md5sum.txt
.
The following files are still not available, as their URLs still link to error messages.
gtex-gene-expression-rsem-tpm-collapsed.polya.rds
: https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/gtex-gene-expression-rsem-tpm-collapsed.polya.rdsintersect_cds_lancet_strelka_mutect_WGS.bed
: https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/intersect_cds_lancet_strelka_mutect_WGS.bedkfnbl-gene-counts-rsem-expected_count.stranded.rds
: https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/kfnbl-gene-counts-rsem-expected_count.stranded.rdskfnbl-gene-expression-rsem-fpkm.stranded.rds
: https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/kfnbl-gene-expression-rsem-fpkm.stranded.rdskfnbl-gene-expression-rsem-tpm.stranded.rds
: https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/kfnbl-gene-expression-rsem-tpm.stranded.rdstcga-gene-expression-rsem-tpm-collapsed.rds
: https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/tcga-gene-expression-rsem-tpm-collapsed.rds@logstar can you delete those files and try to run the download script again? sometimes if they download partially and give a mismatch, you need to delete so they can re-download. let me know!
my release file for
target-gene-expression-rsem-tpm-collapsed.rds
matches that in the releasemd5sum.txt
harenzaj@38f9d38f36c9 v3 % md5sum target-gene-expression-rsem-tpm-collapsed.rds 1a2444fde3b488168e0d3958a2d1b937 target-gene-expression-rsem-tpm-collapsed.rdsI deleted
target-gene-expression-rsem-tpm-collapsed.rds
and rerunbash download-data.sh
, but the md5sum still mismatches.I think the md5sum mismatched files are not available at https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3 . For example, https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/target-gene-expression-rsem-tpm-collapsed.rds links to an error saying the "The specified key does not exist".
The matched files are available. For example, https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/open-targets/v3/tcga-histologies.tsv links to a file for downloading, and its md5sum matches v3 release
md5sum.txt
.
ok, it looks like these may have deleted, but i just put them back - can you try again?
@jharenza Thank you for the quick reply.
All files are available now, and their md5sums all match v3 release md5sum.txt
.
Great!
Hi @jharenza . Thank you for preparing the v4 release.
I was wondering which column of the v4/histologies.tsv
matches the column names of v4/gtex_target_tcga-gene-counts-rsem-expected_count-collapsed.rds
.
Following are the last 10 rows of v4/histologies.tsv
:
Kids_First_Biospecimen_ID | sample_id | aliquot_id | Kids_First_Participant_ID | experimental_strategy | sample_type | composition | tumor_descriptor | primary_site | age_bracket | reported_gender | race | ethnicity | diagnosis_type | diagnosis_category | age_at_diagnosis_days | pathology_diagnosis | RNA_library | EFS_days | OS_days | OS_status | PFS_days | cohort | age_last_update_days | seq_center | parent_aliquot_id | previous_parent_aliquot_id | cancer_predispositions | previous_cancer_predispositions | pathology_free_text_diagnosis | cohort_participant_id | germline_sex_estimate | extent_of_tumor_resection | normal_fraction | tumor_fraction | tumor_ploidy | CNS_region | molecular_subtype | integrated_diagnosis | Notes | harmonized_diagnosis | broad_histology | short_histology | cancer_group | gtex_group | gtex_subgroup | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GTEX-15UF6-1426-SM-AIGJD | AIGJD | NA | 15UF6 | RNA-Seq | Normal | Solid Tissue | NA | Spleen | NA | Female | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-15UF6 | NA | NA | NA | NA | NA | NA | NA | NA | 2 pieces | moderate congestion | fragmented tissue | NA | NA | NA | NA | Spleen | Spleen | ||
GTEX-11PRG-0426-SM-4XJA1 | 4XJA1 | NA | 11PRG | RNA-Seq | Normal | Solid Tissue | NA | Testis | NA | Male | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-11PRG | NA | NA | NA | NA | NA | NA | NA | NA | "2 pieces | moderately autolyzed | spermatogenesis is present" | NA | NA | NA | NA | Testis | Testis | ||
GTEX-1LGRB-2626-SM-B4CPO | B4CPO | NA | 1LGRB | RNA-Seq | Normal | Solid Tissue | NA | Muscle - Skeletal | NA | Female | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-1LGRB | NA | NA | NA | NA | NA | NA | NA | NA | "2 pieces | skeletal muscle with small portion of attached and internal fat | scattered fibers with degenerative change" | NA | NA | NA | NA | Muscle | Muscle - Skeletal | ||
GTEX-ZT9X-0626-SM-4UJTS | 4UJTS | NA | ZT9X | RNA-Seq | Normal | Solid Tissue | NA | Esophagus - Muscularis | NA | Male | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-ZT9X | NA | NA | NA | NA | NA | NA | NA | NA | "5 pieces | all muscularis" | NA | NA | NA | NA | Esophagus | Esophagus - Muscularis | |||
GTEX-XOTO-1926-SM-42ZNC | 42ZNC | NA | XOTO | RNA-Seq | Normal | Solid Tissue | NA | Artery - Coronary | NA | Male | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-XOTO | NA | NA | NA | NA | NA | NA | NA | NA | 2 pieces. 2x2 & | 2x2mm | calcified (delineated) atherosclerosis with ~60% occlusion | NA | NA | NA | NA | Artery | Artery - Coronary | ||
GTEX-13O21-1026-SM-59K21 | 59K21 | NA | 13O21 | RNA-Seq | Normal | Solid Tissue | NA | Small Intestine - Terminal Ileum | NA | Male | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-13O21 | NA | NA | NA | NA | NA | NA | NA | NA | "6 pieces | 10% lymphoid aggregates | delineated | best for LCM or TMA studies" | NA | NA | NA | NA | Small Intestine | Small Intestine - Terminal Ileum | |
GTEX-W5X1-2926-SM-3C8J7 | 3C8J7 | NA | W5X1 | RNA-Seq | Normal | Solid Tissue | NA | Artery - Tibial | NA | Female | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-W5X1 | NA | NA | NA | NA | NA | NA | NA | NA | "2 pieces | 5x1 & | 5x3.5mm | heavily calcified | half of 1 piece without lesion" | NA | NA | NA | NA | Artery | Artery - Tibial |
GTEX-WFG7-1226-SM-3BRN1 | 3BRN1 | NA | WFG7 | RNA-Seq | Normal | Solid Tissue | NA | Esophagus - Gastroesophageal Junction | NA | Male | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-WFG7 | NA | NA | NA | NA | NA | NA | NA | NA | "6 pieces ~8x4mm. All muscle | no mucosa | excellent specimens" | NA | NA | NA | NA | Esophagus | Esophagus - Gastroesophageal Junction | ||
GTEX-XV7Q-2726-SM-7LDFM | 7LDFM | NA | XV7Q | RNA-Seq | Normal | Solid Tissue | NA | Nerve - Tibial | NA | Female | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-XV7Q | NA | NA | NA | NA | NA | NA | NA | NA | "2 pieces | 7.5x5 & | 8x6mm | ~1-15.mm rim of discontinuous flat | rep areas delineated" | NA | NA | NA | NA | Nerve | Nerve - Tibial |
GTEX-ZDTS-0826-SM-4S8WD | 4S8WD | NA | ZDTS | RNA-Seq | Normal | Solid Tissue | NA | Skin - Not Sun Exposed (Suprapubic) | NA | Male | NA | NA | NA | NA | NA | NA | poly-A | NA | NA | DECEASED | NA | GTEx | NA | Broad Institute of MIT and Harvard | NA | NA | NA | NA | NA | GTEX-ZDTS | NA | NA | NA | NA | NA | NA | NA | NA | 6 pieces | many hair follicles | (avoid pubic hair) | well trimmed of subcutaneous fat | NA | NA | NA | NA | Skin | Skin - Not Sun Exposed (Suprapubic) |
Following are typical column names of v4/gtex_target_tcga-gene-counts-rsem-expected_count-collapsed.rds
:
1 SRR1068687
2 SRR1068788
3 SRR1068808
4 SRR1068832
5 SRR1068855
6 SRR1068880
7 SRR1068905
8 SRR1068929
9 SRR1068953
10 SRR1068977
11 SRR1068999
...
8361 b2cb10b8-0554-4ffa-ace1-b00f83c2fa0f
8362 b31102b7-d51b-423d-aa5f-7ca3404dc5c0
8363 b3301ee7-7b60-44a0-b03f-64e01b56d873
8364 b39c65c2-a792-4933-888f-2b4f58140808
8365 b3aa1087-e311-48c3-9739-f78164639f05
8366 b3fdff01-8c52-4947-a117-d79170ec4309
...
19889 ffe5563c-6b36-4ad9-a61e-127b6c053727
19890 fff736ee-ac9b-4bfb-87f0-03cd8e940e9e
19891 fffd244f-bd0f-4d09-8ab3-43bc0b20e297
I also cannot find SRR1068687
in the whole v4/histologies.tsv
.
This matching is relevant to the histology z-score analysis and NBL vs GTEX stably expressed gene analysis.
@logstar thanks for finding this issue. I just filed #31 for this - will see if @ewafula or @komalsrathi has a mapping file you can use temporarily. Sorry about that!
@logstar thanks for finding this issue. I just filed #31 for this - will see if @ewafula or @komalsrathi has a mapping file you can use temporarily. Sorry about that!
@jharenza Thank you for the quick reply. No worries at all. I can imagine the complexity of integrate multiple datasets together. Also, I can use dummy data to work on the analysis procedures before the real data become available.
Interesting - the file that I had uploaded i.e. gtex-histologies.tsv should have this sample:
grep SRR1068687 ~/Projects/PediatricOpenTargets/OpenPBTA-analysis/analyses/rnaseq-batch-correct/input/gtex-histologies.tsv
54 unavailable unavailable Normals GTEX-XXEK dead GTEX-XXEK-0526-SM-4BRWD male Esophagus Esophagus - Gastroesophageal Junction poly-A SRR1068687 GTEx
Oh but you need a mapping file - let me check!
gtex_mapping.txt @logstar I have this - let me know if this does not work for you.
@komalsrathi @jharenza Perfect, I needed the same files, Thank you.
@komalsrathi Thank you for the quick reply. The mapping files work for me.
thanks @komalsrathi !
Interesting - the file that I had uploaded i.e. gtex-histologies.tsv should have this sample:
grep SRR1068687 ~/Projects/PediatricOpenTargets/OpenPBTA-analysis/analyses/rnaseq-batch-correct/input/gtex-histologies.tsv 54 unavailable unavailable Normals GTEX-XXEK dead GTEX-XXEK-0526-SM-4BRWD male Esophagus Esophagus - Gastroesophageal Junction poly-A SRR1068687 GTEx
ahh, we updated to all 17k samples for re-processing, which did not include SRR from their portal.
thanks @komalsrathi! @jharenza, do we need to have an additional column in the histologies.txt file with sample ids used in the expression RDS files such as SRR*?
@ewafula no, we will update the RSEM files to your IDs
Thank you again for the mapping files. @komalsrathi
I found some TCGA sample_barcode
s are mapped to multiple sample_id
s, and some of their expected count sums are different in gtex_target_tcga-gene-counts-rsem-expected_count-collapsed.rds
, as shown in the following table.
@jharenza I was wondering if these duplicates will be resolved in the future releases. For now, I will only keep one of the samples that have the same rsem_expected_cnt_colSum
s.
sample_barcode | sample_id | rsem_expected_cnt_colSum |
---|---|---|
TCGA-37-4132-01A-01R-1100-07 | 44d729b9-be8f-4ae3-ae4d-b71c9d8463f4 | 59037096.86 |
TCGA-37-4132-01A-01R-1100-07 | f3f5bd65-72fc-4cee-befb-2b6446be005e | 59037096.86 |
TCGA-37-4133-01A-01R-1100-07 | 4f8ed570-458a-45c5-ad5c-c254812a4b40 | 63576672.73 |
TCGA-37-4133-01A-01R-1100-07 | ecb95d43-cada-4159-aeca-197dccd8fcfd | 63576672.73 |
TCGA-38-4625-01A-01R-1206-07 | 619f167f-d0c0-468f-b6d1-801bf6c0b017 | 82024039.49 |
TCGA-38-4625-01A-01R-1206-07 | d73deb65-5df1-4fee-bcc6-56b301d77595 | 39274.99 |
TCGA-A2-A0EM-01A-11R-A034-07 | 2aaed860-0ed5-4c5e-9842-60aa13785b58 | 62357901.34 |
TCGA-A2-A0EM-01A-11R-A034-07 | c90db6f3-03b4-4092-8efb-4925a423130d | 62357901.34 |
TCGA-A6-2672-01B-03R-2302-07 | d32d94d4-0b92-4dc1-b3e9-750071f2d7e0 | 21980139.14 |
TCGA-A6-2672-01B-03R-2302-07 | f489a86d-ab77-442c-a89d-ee734faf9caf | 23162032.49 |
TCGA-A6-5661-01B-05R-2302-07 | 298bf211-99f7-44dd-802e-3f1aa3682681 | 7300714.54 |
TCGA-A6-5661-01B-05R-2302-07 | bfb2cdbd-42f2-4784-9450-81236eb39c85 | 37621239.56 |
TCGA-A6-5665-01B-03R-2302-07 | 373b0a02-88d2-49fb-8580-e9933a35c7b8 | 17227691.85 |
TCGA-A6-5665-01B-03R-2302-07 | bdf3b4b3-1619-4ced-8e73-ebde54838291 | 29470181.82 |
TCGA-A7-A0DC-01A-11R-A00Z-07 | 1b000b10-0b4c-4b56-bfb6-950889775865 | 108936872.87 |
TCGA-A7-A0DC-01A-11R-A00Z-07 | 5748a77b-0ea0-487b-982f-ccc2f8bab563 | 72140078.13 |
TCGA-A7-A0DC-01A-11R-A00Z-07 | 98a04bbb-0a89-4809-ad5b-9289a0cdf517 | 72140078.13 |
TCGA-A7-A0DC-01B-04R-A22O-07 | 4b66df1d-9408-4e25-aaf7-2825809baa0c | 48575580.67 |
TCGA-A7-A0DC-01B-04R-A22O-07 | b043709e-acfb-4034-838e-768c894f54dc | 31444946.25 |
TCGA-A7-A0DC-01B-04R-A22O-07 | e59455b1-4bfa-45c2-be8a-1cd8f3a26c0e | 25507397.89 |
TCGA-A7-A0DC-11A-41R-A089-07 | 9ba3e93f-6b7f-464b-9012-7e26592216d3 | 50908011.83 |
TCGA-A7-A0DC-11A-41R-A089-07 | e90e4988-b2fb-4aa0-864e-84320abba5c0 | 50908011.83 |
TCGA-A7-A13G-01A-11R-A13Q-07 | 8e41a6eb-590f-4c75-ba43-33883f6402bb | 80857822.17 |
TCGA-A7-A13G-01A-11R-A13Q-07 | 9e8d6775-5f94-4a53-aed6-08a18b33a701 | 111945370.88 |
TCGA-A7-A13G-01B-04R-A22O-07 | 5195e2af-0a23-40e3-a13c-69ea0b306287 | 27946848.41 |
TCGA-A7-A13G-01B-04R-A22O-07 | 9612e42a-5464-4b48-b7df-819a791bf598 | 50825075.61 |
TCGA-A7-A13G-01B-04R-A22O-07 | b80cbb60-df1a-4b4a-bf1b-0b1a04e64078 | 21767339.23 |
TCGA-A7-A26F-01A-21R-A169-07 | 0aa1751e-1661-4b8b-9e5a-cc7edf6f8a8c | 98378280.28 |
TCGA-A7-A26F-01A-21R-A169-07 | 3d2a9024-b722-4092-989e-bbc1d1332be9 | 125851647.25 |
TCGA-A7-A26F-01B-04R-A22O-07 | 0c30f5b1-5ed5-48d1-b8df-df4611761dd3 | 52560800.24 |
TCGA-A7-A26F-01B-04R-A22O-07 | 822d3bb2-b845-4073-a5fb-7ecf9e5c10a1 | 23021884.5 |
TCGA-A7-A26F-01B-04R-A22O-07 | d7f11b34-6b38-4a7a-80b1-fae9b7243caf | 27747876.78 |
TCGA-A7-A26I-01A-11R-A169-07 | 0752afb5-f92b-4a55-921c-2f36674e8689 | 82498866.41 |
TCGA-A7-A26I-01A-11R-A169-07 | 2c8cb5e5-dd58-47f1-9d4e-bdf7559efc6e | 140091800.87 |
TCGA-A7-A26I-01B-06R-A22O-07 | 373504b8-416a-4bae-9e62-ebf13d22a337 | 58832935.61 |
TCGA-A7-A26I-01B-06R-A22O-07 | a965f4a8-076e-460e-91bb-f821711a63e5 | 11715966.18 |
TCGA-A7-A26I-01B-06R-A22O-07 | f7e4051b-a5f1-4be0-b139-43e5ef75644e | 29322709.95 |
TCGA-AC-A2QH-01A-11R-A18M-07 | 8fe05303-7df7-48a3-862c-e17b39a670c2 | 30632024.4 |
TCGA-AC-A2QH-01A-11R-A18M-07 | b334f93f-48ff-434a-b6bc-8bc0920a5ce8 | 132806590.93 |
TCGA-AC-A2QH-01B-04R-A22O-07 | 6c008c2f-bcc4-4497-962e-8e76879b3ecb | 55621677.64 |
TCGA-AC-A2QH-01B-04R-A22O-07 | 75c3e5e4-ef05-467f-9e7e-d3003db187b1 | 35090851.18 |
TCGA-AC-A2QH-01B-04R-A22O-07 | 7de0e920-e411-4994-a6d8-460cfcef9bdd | 22928152.09 |
TCGA-AC-A3OD-01A-11R-A21T-07 | e9de5496-4486-4ceb-b3b3-30a53b2c52f6 | 105998355.69 |
TCGA-AC-A3OD-01A-11R-A21T-07 | e9f7caba-a833-4c3c-82f2-11d7a233974d | 112332806.62 |
TCGA-AC-A3OD-01B-06R-A22O-07 | 84a6ed5e-ad73-4399-ac37-381721f3b4e8 | 32717482.06 |
TCGA-AC-A3OD-01B-06R-A22O-07 | 99ce385c-e0e2-41c2-a868-40a31eac1e50 | 49766248.16 |
TCGA-AC-A3OD-01B-06R-A22O-07 | a03f6b87-d762-479a-9132-aa42563bacc4 | 90946986.92999999 |
TCGA-AC-A3QQ-01A-11R-A22K-07 | 32911f1f-0202-4bb2-be86-4b68e5afcc00 | 125602035.27 |
TCGA-AC-A3QQ-01A-11R-A22K-07 | 81d1e125-7e15-4563-87c0-b0bfa9dfcad7 | 52836757.19 |
TCGA-AC-A3QQ-01B-06R-A22O-07 | 7af8074e-e82b-4e9e-a66c-3bb0097e1a5b | 30938541.95 |
TCGA-AC-A3QQ-01B-06R-A22O-07 | 7d9d1e8c-710d-4b17-a3cd-8cd6208d3c08 | 13001988.01 |
TCGA-AK-3425-01A-02R-1277-07 | 211b37ea-f8ba-4023-b622-15d3e2543505 | 38348466.47 |
TCGA-AK-3425-01A-02R-1277-07 | 5f4cf18f-5c97-4ee9-adb4-6bc4658ed671 | 52358704 |
TCGA-AK-3426-01A-02R-1325-07 | 45b541d8-86f8-42b2-81a3-b41cefd02b87 | 62668358.35 |
TCGA-AK-3426-01A-02R-1325-07 | a9ad735c-7c05-4efd-9922-3d25c4ec67c0 | 39301779.87 |
TCGA-AK-3453-01A-02R-1277-07 | 440fa965-8e33-4c48-89e6-0b6518e01c44 | 92771830.38 |
TCGA-AK-3453-01A-02R-1277-07 | d032ba59-d306-4a51-aa77-a9356bfefd09 | 34056834.22 |
TCGA-AK-3454-01A-02R-1277-07 | 1b34fe44-0666-49a9-8074-27814eaa1bb6 | 84594722.51 |
TCGA-AK-3454-01A-02R-1277-07 | 4c2fbb0c-1d39-4a36-919c-0f1574b17761 | 23568316.23 |
TCGA-BH-A0B2-01A-11R-A10J-07 | 2daa45eb-0e9d-4b80-9baf-b3b145cc6a98 | 57136914.31 |
TCGA-BH-A0B2-01A-11R-A10J-07 | 750199c0-a667-421a-a329-4521ccf45e01 | 57136914.31 |
TCGA-BR-4255-01A-01R-1131-13 | 41f331b7-7730-4b05-adae-73271dbbd343 | 112032624.76 |
TCGA-BR-4255-01A-01R-1131-13 | e114a9c9-10c6-48d2-ba51-f54be6b7b082 | 65833358.38 |
TCGA-HC-7740-01B-04R-2302-07 | 1ebeb668-2997-4955-9577-1d968df6b51c | 33871400.01 |
TCGA-HC-7740-01B-04R-2302-07 | 758a6e7b-6799-4958-b68c-7bf886ed4876 | 29568416.89 |
TCGA-HC-8258-01B-05R-2302-07 | 05bba2ee-831f-4b65-9f52-5e5e7749b732 | 24161119.78 |
TCGA-HC-8258-01B-05R-2302-07 | d65b7aef-64bd-45f1-8537-03251ad412c1 | 15709507.87 |
TCGA-HC-8261-01B-05R-2302-07 | 547de96d-9fdd-4cd6-a1e0-3dd2fc244cdf | 101590414.04 |
TCGA-HC-8261-01B-05R-2302-07 | 9715ca16-390b-4164-a9f3-6212371b2aea | 28499930.63 |
TCGA-HC-8265-01B-04R-2302-07 | 2b0e3e79-3fdd-4e31-b872-fbd0c2023e74 | 22130941.26 |
TCGA-HC-8265-01B-04R-2302-07 | c7b4ecf6-1c8c-4719-bee3-c7221f9544e5 | 39724297.01 |
@logstar would you mind adding this info to a new issue since this issue is closed? Thank you!
@logstar would you mind adding this info to a new issue since this issue is closed? Thank you!
Sure. I will create a new issue with this info.
What data file(s) does this issue pertain to?
all files
What release are you using?
v3
Put your question or report your issue here.
For histologies files:
kfnbl-histologies.tsv
Kids_First_Participant_ID
the USI which will enable us to link common GMKF + TARGET samples)Blood
-->Peripheral Whole Blood and
Buccal Cells-->
Saliva`Remove the following files from release: gtex-gene-expression-rsem-tpm-collapsed.polya.rds gtex-histologies.tsv kfnbl-fusion-arriba.tsv.gz kfnbl-fusion-starfusion.tsv.gz kfnbl-gene-counts-rsem-expected_count-collapsed.stranded.rds kfnbl-gene-counts-rsem-expected_count.stranded.rds kfnbl-gene-expression-kallisto.stranded.rds kfnbl-gene-expression-rsem-fpkm-collapsed.stranded.rds kfnbl-gene-expression-rsem-fpkm.stranded.rds kfnbl-gene-expression-rsem-tpm-collapsed.stranded.rds kfnbl-gene-expression-rsem-tpm.stranded.rds kfnbl-histologies.tsv kfnbl-isoform-counts-rsem-expected_count.stranded.rds kfnbl-isoform-expression-rsem-tpm.stranded.rds kfnbl-snv-lancet.vep.maf.gz kfnbl-snv-mutect2.vep.maf.gz kfnbl-snv-strelka2.vep.maf.gz kfnbl-snv-vardict.vep.maf.gz target-histologies.tsv tcga-histologies.tsv
Add generally named files per #24