bbglab / intogen-plus

a framework for automatic and comprehensive knowledge extraction based on mutational data from sequenced tumor samples from patients.
https://www.intogen.org/search
Other
0 stars 1 forks source link

IntOGen-plus | Combination fails if gene has no symbol #22

Closed FedericaBrando closed 5 months ago

FedericaBrando commented 5 months ago

While running latest version of IntOGen with updated regions, there was some weird behaviour with the combination. By digging a bit into the problem I found out the reason:

I would say that there are two possible solutions here:

  1. we get rid of this genes - easy, fast but might lose potential driver genes
  2. we change identifier. we either chose the HGNC ID (univocal ID of digits - linked to a list of HUGO symbols, sinonyms to each other) or the ensembl ID for gene (ENSG),
FedericaBrando commented 5 months ago
awk -F'\t' 'BEGIN{OFS="\t"} $1 == "" {print NR, $0}' ../../28/d9938ac0b2c2a81d8788680a0d947b/PEDCBIOP_WXS_HGG_PRY.elements_results.txt
40              ENSG00000237378 Non Available   13   +  390     3       3       1  717  145.84077361972538      0.219   0.34138235294117647     0.28093709830345903     0.3760730408604518      0.2253515547073422        0.32280087566186855
494             ENSG00000187186 Non Available   9    -  264     0
495             ENSG00000188897 Non Available   16   -  9963    0
496             ENSG00000226690 Non Available   7    +  771     0
497             ENSG00000236543 Non Available   9    +  546     0
498             ENSG00000250803 Non Available   5    +  327     0
499             ENSG00000269825 Non Available   19   -  1926    0
500             ENSG00000282936 Non Available   17   -  3537    0
501             ENSG00000283205 Non Available   9    +  234     0
502             ENSG00000283536 Non Available   12   +  519     0
FedericaBrando commented 5 months ago

dirty solution for this run: