CBIIT / nci-ctd2-dashboard

NCI CTD^2 Dashboard
http://ctd2-dashboard.nci.nih.gov/
5 stars 3 forks source link

Missing subject page for shRNA #396

Open vdancik opened 5 years ago

vdancik commented 5 years ago

Search for shRNA sequence GTGAAGAATGTGACAAAGTTT finds two observations, but from the search page it is not possible to go to shRNA page https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/gtgaagaatgtgacaaagttt . However shRNA page sometime works, as search for CAGTTGAGACCTTCTAATTGG finds another shRNA which does have it's own page, https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/cagttgagaccttctaattgg .

vdancik commented 5 years ago

Here is a link to observation about GTGAAGAATGTGACAAAGTTT, https://ctd2-dashboard.nci.nih.gov/dashboard/#observation/20130429-dfci-ataris-analysis-818 . Clicking on shRNA link produces a javascript error TypeError: transcript is null

kcs3 commented 5 years ago

Where does the subject data come from, when available?

zhouji2013 commented 4 years ago

The problem happened when the target transcript of the RNA is empty. The code is fixed to handle that.

kcs3 commented 4 years ago

If the transcript is known to the Dashboard, the subject page for the transcript displays a link to the Target Transcript and to the subject page of the Target Gene. As above, the working example is https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/CAGTTGAGACCTTCTAATTGG

With the code change made above, if the transcript is not known, the subject page now successfully loads and shows the associated observations. However, it does not have entries for the transcript or gene.

kcs3 commented 4 years ago

We have now investigated the actual data file, ../subject_data/shrna/trc_public.05Apr11.txt. The shRNA mentioned above which is not showing transcript or gene symbol is in the data file. The example is #rna/gtgaagaatgtgacaaagttt which on the local instance is http://156.145.29.93:9998/dashboard/#rna/gtgaagaatgtgacaaagttt

zhouji2013 commented 4 years ago

We have now investigated the actual data file, ../subject_data/shrna/trc_public.05Apr11.txt. The shRNA mentioned above which is not showing transcript or gene symbol is in the data file. The example is #rna/gtgaagaatgtgacaaagttt which on the local instance is http://156.145.29.93:9998/dashboard/#rna/gtgaagaatgtgacaaagttt

row 16528, nmId column is"NM_024924"

zhouji2013 commented 4 years ago

short explanation of no transcript: the transcript ID, like NM_024924 in the above example is used to find a matching record of a transcript in the database. If there is no match, the transcript will be missing. The transcript information is from the protein background data file.

details: there are totally 420 cases of missing transcript. Some of IDs, e.g. n/a, noHits probably should be explicitly excluded in finding the match; Some of others, like 'REPLACED BY ....' may need to be handled differently in the loading code.

Here is the list (contens in the brackets are the two relavent fields: transcrtip ID and alternative transcript ID):

1: CCTCGATACAGCATTGGGTTA [NM_001203][NM_001203.2] 2: TCAGGAGGTATAGTGGAAGAA [NM_001203][NM_001203.2] 3: CAATCCAATGTCTACTGCTAT [NM_001204][NM_001204.6] 4: CCTCTGGCATATAATCAAGTT [NM_001260][NM_001260.1] 5: GCACAGTTTGGTCCGTTAGAA [NM_001261][NM_001261.3] 6: AGGGACATGAAGGCTGCTAAT [NM_001261][NM_001261.3] 7: CCGCTGCAAGGGTAGTATATA [NM_001261][NM_001261.3] 8: TGATTGAGATTTGTCGAACCA [NM_001261][NM_001261.3] 9: CCATGAGGCAAGAAACTATAT [NM_001315][NM_001315.2] 10: CAACCCACGAATCAAGCTCAT [NM_001348][NM_001348.1] 11: CATCGCACACTTTGACCTGAA [NM_001348][NM_001348.1] 12: GAAGGAGTACACCATCAAGTC [NM_001348][NM_001348.1] 13: GTGATGTGGATATAATGGATT [NM_005758][NR_002726.2] 14: TGACTTATTCTTGTGTTACAG [NM_005758][NR_002726.2] 15: GAGATGTGAAGATGGAGAATA [NM_144610][NM_001174103.1] 16: TAATTGCTGTGGATACTGTAA [NM_144610][NM_001174103.1] 17: AGTTTCCCATTAGGCCCATTA [NM_144610][NM_001174103.1] 18: GCAGAGATAACCCAACACAGT [NM_032833][NM_032833.3] 19: GCAACATATCCCACTCAGAAA [NM_032833][NM_032833.3] 20: GAGGGCCGAATAAGTGTAGTT [NM_032833][NM_032833.3] 21: ACCTCGTAGATGTGGAATTAA [NM_001093][NM_001093.3] 22: GCTGCGGCCAACATCTTCAAA [NM_001166][NM_001166.3] 23: CGCTACATCCTTACCAACCGT [NM_001320][NM_001320.5] 24: CCGTTCGGCATCTGGCTTGAT [NM_152452][noHits] 25: GATTACAGCATACACAGTGAT [NM_152452][noHits] 26: TGACTGCACAGCCAATGGTTT [NM_152452][noHits] 27: GCTGTAGTTCAGAAGAGGTTT [NM_001005][NM_001005.3] 28: GCATCTTCAAAGCTGAACTGA [NM_001005][NM_001005.3] 29: GTGGAACCCAAAGATGAGATA [NM_001005][NM_001005.3] 30: CCCTCTGAGTAGGCCTATAAT [NM_001293][NM_001293.2] 31: GCCTAGTGATAAATCAGCGTT [NM_001293][NM_001293.2] 32: GCCACACTGGAGAGATTAGAA [NM_001293][NM_001293.2] 33: CCAACAGTTGCTGGACAGTTT [NM_001293][NM_001293.2] 34: TGACTGATACTATGGTGCCTT [XM_292099][noHits] 35: GCACAGACACACGCATTGTAA [NM_001347][NM_001347.2] 36: CGGGAAGCAAACGCTGAAGAT [NM_001347][NM_001347.2] 37: CGGAAGCTACTGAACCCTCAT [NM_001347][NM_001347.2] 38: AGTCACATCTACTCCTCCCAA [NM_001347][NM_001347.2] 39: CCCAGACATTTGGATTTCCAT [NM_002760][NR_028062.1] 40: GCTGCTGTTCTAACCTCAGTA [NM_001109][noHits] 41: AGGGAGTCACACTGACCACCT [NM_001322][NM_001322.2] 42: GCCTTCCATGAACAGCCAGAA [NM_001322][NM_001322.2] 43: GCTCCTGATCTTCCTGAAGAT [NM_001017][NM_001017.2] 44: GATGCTAAATTCCGTCTGATT [NM_001017][NM_001017.2] 45: CCACTTGGTTGAAGTTGACAT [NM_001017][NM_001017.2] 46: CCTGAAGATCTCTACCATTTA [NM_001017][NM_001017.2] 47: CAGCACTATCAGCATTGTGAA [NM_020061][NM_020061.4] 48: GCGAACTCATACTGGAGAGAA [NM_003438][NR_023311.1] 49: CCTGGTATTGAAGAGGTGAAT [NM_001208][NR_026983.1] 50: CTTGAAGATAAATCTGCCTAA [NM_000957][NR_028292.1,NR_028294.1,NR_028293.1] 51: GCTGCTTATCATCCTCTCCTA [NM_033519][NR_002140.1] 52: CCCGGACATTGCTGGCTCAAT [NM_182611][NM_001161808.1] 53: CAGGTGAATCAAGGAACCCTT [NM_182611][NM_001161808.1] 54: TGGGCTATATCTGCGGTGAAA [NM_005305][noHits] 55: CCTGCTGCTGTTCCTGCCTTT [NM_005305][noHits] 56: GCTGGAATAGCCAAGCTCTTT [XM_116384][noHits] 57: GATGTCATGGACCTCACAGAA [XM_116384][noHits] 58: GAAAGGCAAGAAGGAGAGCAA [XM_116384][noHits] 59: GCCATTGCCATGGCTGGAATA [XM_116384][noHits] 60: GCTCAGGAGTGAGGATGTCAT [XM_116384][noHits] 61: GCCAACTGGATGAGAACCAAA [NM_052996][NM_052996.2] 62: GTCCAGATTTGGTTTCAGAAT [XM_208028][noHits] 63: GATTCAGATCTGGTTTCAGAA [XM_208028][noHits] 64: GCCCTGCTCCTCCGAGCCTTT [XM_208028][noHits] 65: CAAGCTCTTTGTTGGAGAGGT [XM_210613][noHits] 66: TGCGGGCTATCACTGGCAGTT [XM_210613][noHits] 67: AGGAGGCTCAGAGGATGACAA [XM_210613][noHits] 68: TGAGGACCTAGACGGGAACTT [XM_210613][noHits] 69: CCTGCTGTCTGCCATGTCTGA [XM_210613][noHits] 70: CTAGATGGGAACTTGGAAGAA [XM_210642][noHits] 71: GCAAAGCTAACCTATCATCAA [NM_024498][noHits] 72: CGACCCTTACTACACATAATA [NM_024498][noHits] 73: TGACCCTAAGAAGATATAGAA [NM_024498][noHits] 74: CGCTTACTAAACATAAGGTAA [NM_024498][noHits] 75: CCGCACAATGAGTCAGAAGAT [XM_290345][noHits] 76: CGCATGAGCATCAAAGCCTAT [XM_290345][noHits] 77: CCCTGAGTACAGTGTTGCAAT [XM_290345][noHits] 78: GACATGGAATTTGCTAAGAAT [XM_290345][noHits] 79: GCAATTCATACTGGAGAGAAA [NM_024924][NR_003578.1] 80: GTGAAGAATGTGACAAAGTTT [NM_024924][NR_003578.1] 81: CCAGGAGATCCCACAGGAGAT [XM_291857][noHits] 82: GCAAGTCCTGAGGACAGGCAA [XM_291857][noHits] 83: GCTCACCACTCTGCCCACGAA [XM_291857][noHits] 84: CCAGGAGATCCCACAGGTGAT [XM_291857][noHits] 85: GCTGGAATAGCCAAGGTCTTT [XM_373077][XM_936303.1,XM_001715032.1,XM_373077.2] 86: GAAACGCAAGAAGGAGAGGAA [XM_373077][XM_936303.1,XM_373077.2] 87: GCTTTCCCAAGAGCACGCATT [XM_373077][XM_936303.1,XM_001715032.1,XM_373077.2] 88: TGCGGGTCTGATGCGGTCTAT [XM_373077][XM_936303.1,XM_001715032.1,XM_373077.2] 89: AGCACTTGTTGCACGTCTGAT [XM_373078][XM_936313.1,XM_001715028.1,XM_373078.1] 90: CTGAAGGGCTGCAACGAGGAT [XM_373078][XM_936313.1,XM_001715028.1,XM_373078.1] 91: GTCCGACTCCAAGTCCGGGAA [XM_373009][XM_937659.5,XM_926341.4] 92: TGGAGAACAAGTTCAAGGCCA [XM_373009][XM_937659.5,XM_926341.4] 93: GAACCGCCGAACCAACCCGCT [XM_373009][noHits] 94: GCTGTCGCTCAGCCTCACCGA [XM_373009][XM_937659.5,XM_926341.4] 95: GCGCTACCTGTCGGTGTGCGA [XM_373009][XM_937659.5,XM_926341.4] 96: GTGAGAGTGATCGCGGTCTTA [XM_373255][noHits] 97: GCTCGCTTGAGAGCGCCCTTA [XM_373255][noHits] 98: GCAAGTCAGTTCTCATTTCTT [XM_376622][noHits] 99: CCTTGAGGTTACCAGGTAGAA [XM_376622][noHits] 100: GCTCCCAAGGAAGAAGTAGAT [XM_376622][noHits] 101: CCTACTGATAGGGACTCCATA [XM_376622][noHits] 102: CGAGACAGTCTTTCTCATCTT [XM_376622][noHits] 103: CCTCACAGAAGGTGACAGTGA [XM_377875][noHits] 104: CGGTCAGCGTTCCCGAGAGCA [XM_377875][noHits] 105: GTCTGCCATGTCTGAGGAGCA [XM_377877][noHits] 106: GCTACGAAGTGTGTCGCCGGT [XM_377877][noHits] 107: CTAGACGGGAACTTGGAAGCA [XM_377877][noHits] 108: TCAGAGGATGACAACCCTGCT [XM_377877][noHits] 109: CATGGCTGGAATAGCCAAGCT [XM_377878][noHits] 110: GATGGAATCCCTGAGGACCTA [XM_377878][noHits] 111: CTGTCCCGCTACGAAGTGTGT [XM_377878][noHits] 112: GTGAGGATGTCATGGACCTCA [XM_377878][noHits] 113: CCTCAGCTCCTCCTGCAGCCA [XM_377878][noHits] 114: GCTCATCGACTCGGTCACCAA [XM_376763][noHits] 115: CATGGACCTCACAGAAGGTGA [XM_377879][noHits] 116: CCATGTCTGAGGAGCAGCTGT [XM_377879][noHits] 117: CGGGTCTGATGCGGGCTATCA [XM_377880][noHits] 118: CGAAAGGCAAGAAGGAGAGCA [XM_377880][noHits] 119: CTGAGGACCTAGACGGGAACT [XM_377880][noHits] 120: GAGGAGGCTCAGAGGATGACA [XM_377880][noHits] 121: GAAGCACCCAGGGATCAGGAA [XM_377880][noHits] 122: TCACATTAACAGCCCACAGTT [XM_071173][noHits] 123: CCAGTAGAAATCACACTGGAA [XM_066752][noHits] 124: GCTTGAAGACATTCACAACTT [XM_066752][noHits] 125: GCTGCTCTTCAACACAAGATA [XM_377946][noHits] 126: GTAAGCTACAAGGAGGAGCTT [XM_377946][noHits] 127: GCCCTGATACATCGAATGATA [XM_031553][REPLACED BY TRCN0000221601] 128: GCAGTGGTAGACGAGTGAAAT [XM_031553][REPLACED BY TRCN0000221604] 129: CGTACAATTCAAGGCCATTTA [XM_031553][REPLACED BY TRCN0000221605] 130: CGCCCTGCACACTAGCACCAT [NM_003926][NM_003926.5] 131: CGGCCTGAACGCCTTCGACAT [NM_003926][NM_003926.5] 132: GCCCTGCAGAATACTAATAAT [XM_379792][noHits] 133: GCTGAACCAGACATGGATGAT [XM_379855][noHits] 134: GAGATGTATGAGGTTCGTATT [XM_379855][noHits] 135: CCTTGATTTCCTAGTTGACAT [XM_379855][noHits] 136: CGGATCCCAAACCGCCCTGCT [NM_012148][NM_012148.2] 137: GAAACCTTTCTTTGAGAAGTT [NM_173643][noHits] 138: CGAGTGGCTTTGCCCTCCCGA [NM_033178][noHits] 139: CGCGGTTCACAGACCGCACAT [NM_033178][noHits] 140: GCTCTCCTTGCCAGGTTCCAA [NM_033178][noHits] 141: CGTGGAAATGAACGAGAGCCA [NM_033178][noHits] 142: CAAAGATGAAGACTTGTGGAT [NM_145237][noHits] 143: TGAAGACTTGTGGATATGGAT [NM_145237][noHits] 144: GCGGTTCACTTCGTATCAGAA [XM_376537][REPLACED BY TRCN0000220214] 145: CCTTCTCAGAATAGTCCAATT [XM_376537][REPLACED BY TRCN0000220215] 146: CGTAGTAGAGATCGTATGTAT [XM_376537][REPLACED BY TRCN0000220216] 147: CCTGAGCAGGTAAAGTCTGAA [XM_376537][REPLACED BY TRCN0000220218] 148: CCTCTCTTTGAACCGTTACTT [XM_166527][noHits] 149: CCCGAATTGAGTCGTTTCTAT [XM_166527][noHits] 150: GCCCTGAAGAACAGTAATGAT [NM_032031][NR_002182.1] 151: GCTGTCTCAAACATTCAAGAA [NM_032031][NR_002182.1] 152: GACCTCACAGAAGGTGACAAT [XM_373056][noHits] 153: TCGCCGGTCAGCTTTCCCAAA [XM_373056][noHits] 154: ATGGCTGGAATAGCCAAGCTT [XM_373057][noHits] 155: GCGGCCATTGCCATGGCTGGA [XM_373057][noHits] 156: GAGCAGCTGTCCCGCTACGAA [XM_373057][noHits] 157: CCCTGAGGACCTAGACGGGAA [XM_373058][noHits] 158: GACCTCACAGAAGGTGACAGT [XM_373061][noHits] 159: GCAAGAAGGAGAGCAAGCCCA [XM_373061][noHits] 160: GAAAGACGTTAAATTACGGAT [XM_373076][noHits] 161: CACACCTGTAATCCCAGCATT [XM_370946][noHits] 162: CGTCATGTTGATAATCCAAAT [NM_022050][NR_004859.1] 163: CCAGCAACGAGAACGCCACAT [NM_017876][noHits] 164: CCCAAGGAACATTAGGGTGAA [NM_198083][NM_001193636.1,NM_001193637.1,NM_001193635.1,NM_198083.3] 165: ACCAGAGGTCTTGCTGAGGAT [XM_007651][REPLACED BY TRCN0000221400] 166: CCGACGAATCACATTCTTGAT [n/a][NM_001093.3] 167: CCCGAGAACCTCAAGAAATTA [n/a][NM_001093.3] 168: CGAAACTACCTTCAACTCCAT [NM_001101][NM_001101.3] 169: CAGAAGGTGACAGTGAGGCTT [XM_373059][noHits] 170: CAGGAAGGTGAGCTCAGGAGT [XM_373059][noHits] 171: ACGGGAACTTGGAAGCACCCA [XM_373059][noHits] 172: TGGCAGTTCGGTGTCGGAGAA [XM_373075][noHits] 173: GAGGATGTCATGGACCTCACA [XM_373075][noHits] 174: TGGCTGGAATAGCCAAGCTCT [XM_373075][noHits] 175: ACACATACGAAAGGCAAGAAG [XM_373075][noHits] 176: CCCGGGAGCATCTGGGACTTT [NM_023076][noHits] 177: CCTGTCAGCACCACATCCTCT [NM_023076][noHits] 178: CCTGCCGAAGCTGCACTCGCT [NM_023076][noHits] 179: CCTCATCCTCAACATCCTCAA [XM_290331][NR_003267.1] 180: GCCTGTCTTGTGTGAGGTGTT [XM_290331][NR_003267.1] 181: AGCACCATCAACCTCTACTTT [XM_290331][NR_003267.1] 182: GACTCAATAGATGTAGGGAAA [NM_001303][NM_001303.3] 183: GCCAAAGGAGATGATGCTTTA [XM_371677][noHits] 184: CGTCTGTGTGATAACAGGCAA [XM_291054][noHits] 185: AGCTGGTGAAACATGAAGAAA [XM_291054][noHits] 186: GCTAAACCAGTTCCGGAAGAA [XM_209597][noHits] 187: CCTTCCTTCTCTCGTCTGTAT [XM_376573][noHits] 188: ACTGACTCTTGATGGACACAA [XM_376573][noHits] 189: CCAGTAAATCATCTGCTATTA [NM_001076][NM_001076.2] 190: CGATAGATGGACATATAGTAT [NM_001077][NM_001077.3] 191: TGTTCGATAGATGGACATATA [NM_001077][NM_001077.3] 192: CCCAAGTTTGTGATGGACATA [XM_373373][noHits] 193: CCTCAACTACATGGTCTACAT [XM_373373][noHits] 194: CCCAAAGGAACTGGAAGACTT [XM_374855][noHits] 195: ACAAATAAGGTGGCCCTGGTA [XM_375067][noHits] 196: CAGCAGAATGTGGACCAGGCA [XM_375067][noHits] 197: GCAGGAAAGTGTCGCAAAGAT [XM_375958][noHits] 198: CCATGTACCTACCACCATCAT [XM_377597][noHits] 199: CCCTGTTGTTCTAAAGCTAAA [NM_032267][noHits] 200: CCGATTACCTTTCTTCTGTAA [NM_032267][noHits] 201: CCCTCTGAACATGAGCATCAA [NM_032267][noHits] 202: GCCAGAAATACCTTGTAACTT [NM_032267][noHits] 203: CCTGCACCATTTGGACATCAT [XM_376950][noHits] 204: CCTCGGATCGAATAACGATAA [XM_376950][noHits] 205: GCTCGCATAAATGTGAGTCTT [XM_293656][noHits] 206: CCACAGCAAATGTGATTGATA [XM_293656][noHits] 207: AGGCATGGAGATGAATGACTT [XM_373214][noHits] 208: GCCAAGCTTGCCCTGGCCTAT [XM_374694][noHits] 209: TGAATGACTTGGTGGTGAGCT [XM_374694][noHits] 210: CCACGGCATTTCAGACACTTT [NM_212553][NR_027279.1] 211: CCACCAAATATTTGGAGGCTA [NM_212553][NR_027279.1] 212: GAGTGGCAGTTCAACCACTTT [NM_212553][NR_027279.1] 213: CGAAGAACTCAATGGAGAGAA [NM_212553][NR_027279.1] 214: CAAACGCTCTAAGTTTAAGAA [XM_379892][noHits] 215: TGGTTTCAGAATGAGAGGTCA [XM_374852][noHits] 216: CCCTCCCGACACCTTCGGACA [XM_374852][noHits] 217: CCCACAACTTTCTAGCTGTTT [NM_001033][NM_001033.3] 218: GCTGAGCCTAACTATGGCAAA [NM_001033][NM_001033.3] 219: CCTGCTCAGATCACCATGAAA [NM_001033][NM_001033.3] 220: CCAATCCAGTTCACTCTAAAT [NM_001033][NM_001033.3] 221: CTGTGGTTGTATCTGTTCAAT [NM_001184][NM_001184.3] 222: GCCAAAGTATTTCTAGCCTAT [NM_001184][NM_001184.3] 223: GCCCTTAAATAAAGAAGGTAA [NM_001010][NM_001010.2] 224: CGCAAACTTCGTACTTTCTAT [NM_001010][NM_001010.2] 225: CCGCCAGTATGTTGTAAGAAA [NM_001010][NM_001010.2] 226: GCTGCAGAATATGCTAAACTT [NM_001010][NM_001010.2] 227: CGTGTCTGAGATCATGATGTA [NM_001037][NM_001037.4] 228: CCTGAAGAACTAAAGGACTTA [NM_001289][NM_001289.4] 229: GCGTCTGGATGACTACTTAAA [NM_001289][NM_001289.4] 230: CCCTGACGACAGAAGAATCAT [NM_001071][NM_001071.2] 231: GATAGCTGATGCCCTCCTTCA [NM_183003][noHits] 232: GAATCCACTTCCAACTGGCTA [XM_208356][noHits] 233: GCCTTTAATCAAGCCTGGCAT [NM_201252][NM_001145289.1,NM_201252.3] 234: CGGTGGATGTACCACCACTCA [NM_201252][NM_001145289.1,NM_201252.3] 235: GAGGGCAAGTTCGTGGAGCTT [NM_201252][NM_001145289.1,NM_201252.3] 236: CCTTGATGTCACAAAGAAGAA [XM_371837][noHits] 237: GCACCCACAGTTCTACATCAT [NM_003293][noHits] 238: CATCCAGACTGGAGCGGATAT [NM_003293][noHits] 239: CCTGCAGCAAGCGGGTATCGT [NM_003293][noHits] 240: CAGCCAGAGGGACTCCTGCAA [NM_003293][noHits] 241: GTGCTTGATGAGAATTACAAT [NM_001308][NM_001308.2] 242: CCAGGTATCTACACTGTTAGT [NM_001308][NM_001308.2] 243: CCTGAAGGAAGGTGTTGATTA [NM_001176][NM_001176.3] 244: CCTCAGTTCTGCACACAGCTA [XM_380013][noHits] 245: CTGTGCAATTTCAACATCATA [XM_208443][noHits] 246: CGCTTCCTGAATGCTGAGAAT [XM_372200][noHits] 247: CCTCAGTTTGAGCCAATAGTT [XM_372200][noHits] 248: CCTTGGAGATACCTCATCATA [XM_374801][noHits] 249: CCGTTCCAAATATGAGGAGAA [XM_495823][noHits] 250: GTGTATTGAATGCTCAGGTAT [XM_495823][noHits] 251: AGGAAATCACAAATTCAGCTA [XM_495830][noHits] 252: GTGTATTGAATGCTCAGGAAT [XM_495830][noHits] 253: GAACTCTCAAACAGATGCTTT [XM_495830][noHits] 254: CCAGCCAGCATTATCTTACAA [XM_495830][noHits] 255: GCCAAGGAGTCAAAGAACATA [XM_499367][noHits] 256: CCCTCACTGGATTCATGAGAT [XM_499367][noHits] 257: GCCCATGAAGCGCCACATCTT [XM_495884][noHits] 258: CCTGAAGGAAACGAAAGACAT [XM_496026][noHits] 259: CACAGAATTATTCCAGGGTTT [XM_292596][noHits] 260: CAACACAAATGGTTCCCAGTT [XM_292596][noHits] 261: ACCAGCAAGAAGATCACCATT [XM_371409][noHits] 262: GATGGCAAGCATGTGGTGTTT [XM_371409][noHits] 263: TGGTGACTTCACGCACCATAA [XM_379998][noHits] 264: CCCTTGGACCACGTCTCCTTT [XM_379998][noHits] 265: GCTCGCAGTATCCTAGAATCT [XM_495800][noHits] 266: GCAAAGTGAAAGAAGGCATGA [XM_495800][noHits] 267: AGTGAAAGAAGGCATGAATAT [XM_495800][noHits] 268: CAAATGCTGGACCCAACACAA [XM_495896][noHits] 269: GCCAAGACTGAGTGGTTGGAT [XM_495896][noHits] 270: GTGTGTCTCCTTTGAGCCTTT [XM_170597][XM_001717840.3,XM_001717979.3,XM_170597.8] 271: GCAAGATATGTATGTGGCTAT [XM_497732][noHits] 272: CGGAATTACCAGAATAGAGAA [XM_497732][noHits] 273: GCTTGAATTACTGTGGGCATA [XM_293886][noHits] 274: CGTGTGAATCCTCTGGGTCCT [NM_004142][noHits] 275: GCACCCTAGCCCATGCCTTCT [NM_004142][noHits] 276: GCAGGAGGAATTTGATGTATT [XM_293293][noHits] 277: CGCCATGTTCTCAGATAAGAA [NM_199345][NR_003700.1] 278: GCGGGAGTTTGATTTCTTTAA [NM_199345][NR_003700.1] 279: GCGTGAAGACATAAGCATCAT [NM_199345][NR_003700.1] 280: GCTAGCCCTGACCAGTCCTGT [NM_080789][noHits] 281: CCAGGGCTAGCCCTGACCAGT [NM_080789][noHits] 282: CAATTGTCTGAACAGCGCACT [XM_086287][NR_002930.2] 283: TCGTTGCAGGTTCGAGGCCGA [XM_372626][NR_033866.1] 284: GCCGTGGACCTGTACGAGTAT [XM_496155][noHits] 285: CGTCCCGTCCTGGGTGGGTTT [XM_496155][noHits] 286: CCAGTCAGAAACAGTTTGCTA [XM_496170][noHits] 287: CTGGTATTGAAGAGGTGAATA [XM_293984][noHits] 288: CCTGCTGTGTACCTGTGATAA [XM_070277][noHits] 289: CCCGGAGGAATTTGAGTCTTA [XM_070277][noHits] 290: GCCTTGAAGATGACATTCGCT [NM_001153][NM_001153.3] 291: CCCAACGAGTACATCCATTAT [NM_001098][NM_001098.2] 292: CCGGCTGACTACAACAAGATT [NM_001098][NM_001098.2] 293: CCTGCTAGAGAAGAACATTAA [NM_001098][NM_001098.2] 294: GCCCAAGGTCAACAGAACATT [NM_014513][NM_014513.2] 295: CTCCTCTTCTTTCTCCTTCAT [NM_014513][NM_014513.2] 296: GCGAACTTCATTGCTCCCAAA [NM_014733][NM_014733.3,NM_001105251.1] 297: CCTGAGAGAATACGTGGATAT [NM_014733][NM_014733.3,NM_001105251.1] 298: GCCAGCCATGTGGATTACTAA [NM_014733][NM_014733.3,NM_001105251.1] 299: CCGGAGATTCTTCTTTAATTT [NM_001200][NM_001200.2] 300: CCACTGGAACTGTTCCCAAAT [NM_001201][NM_001201.2] 301: CCCAAGTCCTTTGATGCCTAT [NM_001201][NM_001201.2] 302: CCAGAGCCTTATATCTTGGTA [NM_001201][NM_001201.2] 303: CTTGTTATAAAGAGGCACATA [NM_138726][NR_003088.1,NR_003087.1] 304: CCTGGTTTAGCAGAGTAATTA [NM_138726][NR_003088.1,NR_003087.1] 305: CCGGCCTTCATCGCAGTACAT [NM_031211][NR_002593.1] 306: CGCCTTCCTCAAGCTCTGGAT [NM_031211][NR_002593.1] 307: CCTTGGTGAGACATACTAGAA [NM_181429][NM_181429.1] 308: GCATGGAATTACACAAGCAAA [NM_001316][NM_001316.2] 309: CGCTGACAAGTATCTGTGAAA [NM_001316][NM_001316.2] 310: CCGTCATGAATTTAAGTCAAA [NM_001316][NM_001316.2] 311: CCGTCTTCCTATATGGCCTTA [NM_001316][NM_001316.2] 312: TCAAGTTGGAAGTGTGTCTTT [NM_006780][noHits] 313: GCTGGTATATTTGATGCCTAT [NM_032351][NM_032351.3] 314: CCTCTAATTCTGTAGGACTTT [NM_006021][noHits] 315: CAAGGGTGGATAATTACTGTA [NM_006021][noHits] 316: CCACGGGATTTCAGACACTTT [NM_001001324][noHits] 317: CCCAACGCACTTGTGATTCAT [NM_001001324][noHits] 318: CAGAAGAGAATATCGCTTCTA [NM_152302][noHits] 319: CCGGAAGAAGATGATGGAAAT [NM_001006][NM_001006.3] 320: GCCCAAGTTTGAATTGGGAAA [NM_001006][NM_001006.3] 321: GCCAAGTACAAGTTGTGCAAA [NM_001007][NM_001007.4] 322: CCACAAGTTGAGAGAGTGTCT [NM_001007][NM_001007.4] 323: CCACTCGACTTTCCAACATTT [NM_001007][NM_001007.4] 324: GCTCAGAGTGTTGTACTCGTA [NM_001012][NM_001012.1] 325: CCGTGCCCTGAGGTTGGACGT [NM_001012][NM_001012.1] 326: CTCCTGAGGAAGAAGAGATTT [NM_001012][NM_001012.1] 327: GCTGAAGCTGATCGGCGAGTA [NM_001013][NM_001013.3] 328: GCAAGATGAAGCTGGATTACA [NM_001013][NM_001013.3] 329: CCTTCATTGTCCGCCTGGATT [NM_001013][NM_001013.3] 330: GCCTGAAGATAGAGGATTTCT [NM_001013][NM_001013.3] 331: CCTGCGGGACATGATCATCCT [NM_001018][NM_001018.3] 332: CGAGCAGCTGATGCAGCTGTA [NM_001018][NM_001018.3] 333: ACATGATCATCCTACCCGAGA [NM_001018][NM_001018.3] 334: CGGTTTCATTAAGTTGGACTA [NM_001032][NM_001032.3] 335: GCACCTACATTGACAAGAAAT [NM_001015][NM_001015.3] 336: CCGAGACTATCTGCACTACAT [NM_001015][NM_001015.3] 337: GCACTACATCCGCAAGTACAA [NM_001015][NM_001015.3] 338: GATGCAGAGGACCATTGTCAT [NM_001015][NM_001015.3] 339: GCGGTAATGAAATATGGGAAA [NM_001253][NM_001253.2] 340: GCCAAGACCATCAGAAGTAAA [NM_001253][NM_001253.2] 341: GCGAGTGAAATTGCACGTCAA [NM_001253][NM_001253.2] 342: GAAGGTAACAAACCTCAACGT [NM_006249][NM_006249.4] 343: CCTCTGTTTGCACTGGACATA [NM_199283][noHits] 344: CGGCAACATTATGCTGGACAA [NM_199283][noHits] 345: CATCGACCTCTTCAAGAACAT [NM_199283][noHits] 346: GTTTCCTACTCAAGGAGAGAA [NM_199283][noHits] 347: CCCTGAAAGAATCCACAGTAA [XM_371497][noHits] 348: ACAAACCAACAAGCAGTCGAT [XM_371497][noHits] 349: CCATGGATATTCAGAGCCTCA [XM_496630][noHits] 350: GATATACCACTATGGCCACAT [XM_496630][noHits] 351: CCCTTCTGCTCATGCAGCATT [XM_496630][noHits] 352: GAGGATGAATTAAAGCCTTAT [XM_377958][noHits] 353: GCCCAATTCGAGGCTATCATT [XM_290923][noHits] 354: CGACACAATATCCCTGGACAT [XM_290923][noHits] 355: GAGAGACCTGAACCTGGAAAT [XM_497910][noHits] 356: CCTTGGTTCAAACCACAGATT [XM_372233][noHits] 357: GTGCAAGAATATGGCGACCAA [XM_372705][noHits] 358: CCCAGCGTCGTCATCGTGTTT [XM_372705][noHits] 359: CTCTCAGATGTGCATTGGAGA [XM_378155][noHits] 360: GTGCTTTGGAGACTCTGAGAT [XM_378155][noHits] 361: GTGCATTGGAGACTCTGAGAT [XM_378155][noHits] 362: GATGTGCTTTGGAGACTCTGA [XM_378155][noHits] 363: CACTGCCATCATCTAACCATT [XM_497414][noHits] 364: CTCTAGACTAACGCCACTGAT [XM_497414][noHits] 365: CTATGCTGTGAGGATGAATTA [XM_380022][noHits] 366: GCCGCCTTCTCACAACCACAA [XM_497433][noHits] 367: CGCCTCTTCAACGCGCACGCT [XM_372274][noHits] 368: CCATCGTTACAATGGCCTCTT [XM_499301][noHits] 369: CCCAACTCATATTTGGACTTT [XM_499301][noHits] 370: CCTGAAGTTCTTGTTTCTGTT [XM_375150][noHits] 371: TGCTGGAGTTTAGGAGTTATT [XM_375150][noHits] 372: GCATTGACTAATCAAAGGATT [XM_497790][noHits] 373: GAGACAATGAATTAAGGGAAA [XM_497790][noHits] 374: GCCAGAGGTTTGGCCTGCTTT [XM_497790][noHits] 375: CCTCAACCTTTACTACACATA [NM_001009883][noHits] 376: CCTCTAACCTTACTACACATA [NM_001009883][noHits] 377: CCCTGCAATATGAAGAGACAT [NM_033548][noHits] 378: CAATATGAAGAGACATGCGAT [NM_033548][noHits] 379: TGTCTCTAAGCCAGACCTGAT [NM_033548][noHits] 380: CGTGCCATCTTTAATGTTAAA [NM_199358][noHits] 381: TCTTTCAGCATTGAGAGTATT [NM_199358][noHits] 382: CCACAGATAAGATAACTCATA [NM_004876][noHits] 383: GCCTTCAGGTACATGAAGTAA [NM_001004314][NR_027049.1] 384: GCCTTCGAATACATGGACTAA [NM_001004314][NR_027049.1] 385: TCCGCACTTCTCAGAGACTTT [XM_499494][noHits] 386: GCAAGAAATACCTGAGCTTGA [XM_499494][noHits] 387: CCAACCTGCATGGACTGTGAA [NM_001306][NM_001306.3] 388: CGACCGCAAGGACTACGTCTA [NM_001306][NM_001306.3] 389: GCGCTGGAGAAATACAACAAA [XM_497418][noHits] 390: CGGCGTCAAGGTGAAGATAAT [NM_133431][NM_133430.2,NR_033256.1,NM_020411.2] 391: ACGGCCATAACTAGGGAGGAA [NM_133431][NR_033256.1] 392: CTTCGATGATATTGCCAAATA [NM_174962][noHits] 393: CCAGAGAATCATCCCGAAGAT [NM_174961][NR_027250.1] 394: CTTCAATGATATTGCCACATA [NM_174961][NR_027250.1] 395: CCAGGGATGATGATAAAGCAT [NM_174961][NR_027250.1] 396: CCTGTTCTGAGGATTCCTCTT [NM_198694][NM_198694.2] 397: CCTGCTCTAAGGATTCCTCTT [NM_198694][NM_198694.2] 398: AGAGAGAGGGAGAGAAGAGTT [XM_499454][noHits] 399: CGCCCTCGTCATCATCAGCAT [NM_001305][NM_001305.3] 400: CCAAGTATTCTGCTGCCCGCT [NM_001305][NM_001305.3] 401: GCAACATTGTCACCTCGCAGA [NM_001305][NM_001305.3] 402: TACTTTCTATGAGAAGCGTAT [NM_001010][NM_001010.2] 403: CGGCATGGACGAGCTGTACAA [n/a][noHits] 404: CTCTCGGCATGGACGAGCTGT [n/a][noHits] 405: GCGACGTAAACGGCCACAAGT [n/a][noHits] 406: GCGCGATCACATGGTCCTGCT [n/a][noHits] 407: GTCGAGCTGGACGGCGACGTA [n/a][noHits] 408: GCCACAACATCGAGGACGGCA [n/a][noHits] 409: AGAATCGTCGTATGCAGTGAA [n/a][noHits] 410: TGAGTACTTCGAAATGTCCGT [n/a][noHits] 411: GCTGCAGAATATGCTAAACTT [NM_001010][NM_001010.2] 412: GCCAAGTACAAGTTGTGCAAA [NM_001007][NM_001007.4] 413: AGAATCGTCGTATGCAGTGAA [n/a][noHits] 414: CCGCCAGTATGTTGTAAGAAA [NM_001010][NM_001010.2] 415: CCTTCATTGTCCGCCTGGATT [NM_001013][NM_001013.3] 416: GCAAGATGAAGCTGGATTACA [NM_001013][NM_001013.3] 417: GCCTGAAGATAGAGGATTTCT [NM_001013][NM_001013.3] 418: CAAGCAAGTAGCCTCCGAGAT [NM_001348][NM_001348.1] 419: AGATTGTGAACTATGAGCCGC [NM_001348][NM_001348.1] 420: CGTCTGAAGGAGTACACCATC [NM_001348][NM_001348.1]

kcs3 commented 4 years ago

In the above list of transcripts not found, the first entry is 1: CCTCGATACAGCATTGGGTTA [NM_001203][NM_001203.2] I checked the protein source data, which is the file uniprot_sprot_human.dat. (Despite the .dat ending, it is a text file). There is an entry in that file for NM_001203, where it appears on line 661480(!) in the record: DR RefSeq; NP_001194.1; NM_001203.2. [O00238-1] This record corresponds to the gene BMPR1B: GN Name=BMPR1B; which is found in the Dashboard.

So it would be interesting to know why the connection is breaking down, as searching on NM_001203 does find the shrna results.

zhouji2013 commented 4 years ago

Upon further investigation, I found the above list I posted of failed matching are not all because there is no match. Instead, many of these failed because there are multiple matches. For example, NM_001203 is such a case of multiple matches. On the other hand, the original case that started this issue, NM_024924, is indeed a case of no match.

The reason of multiple matches is surprising. The way to decide a match is not by exact match of refseqId but by the beginning part of refseqId. For example, NM_001203 matches NM_001203247, NM_001203249, NM_001203248, etc. total 13 matches. I don't know why it is done this way, but it is clearly done intentionally in the implementation.

zhouji2013 commented 4 years ago

This issue has 'evolved' away from the original reported problem. The original title was accurate but now has little to do with the discussion in the comments. We should re-organize what we want to change here, preferably as new issues or a new proposal.

zhouji2013 commented 4 years ago

We should close this issue and create new ones that have more specific goals.