Closed enoriega closed 8 years ago
@hickst: can you please check this PR?
What's the problem with: # HDFs UA-CLine-100087 uaz CellLine
?
The atcc.tsv
file contains a new dictionary of cell lines. I feel it should be added as a standalone file instead of appended to the NER Override file. All the entries in the file have external "first degree" context like species, disease, cell type, etc. Once we are done with the extension of the NER override file lets analyze this one.
Identified the following issues in the NER file additions:
1) Should HDFs
refer to one of these:
Cellosaurus.tsv.gz:HDF-FOP CVCL_W541 Homo sapiens
Cellosaurus.tsv.gz:HDF-FOP CVCL_W542 Homo sapiens
Cellosaurus.tsv.gz:HDF/TERT1 CVCL_9Q55 Homo sapiens
2) Should HUVECs
refer to one of these:
Cellosaurus.tsv.gz:HUVEC-C CVCL_2959 Homo sapiens
Cellosaurus.tsv.gz:HUVEC-CS CVCL_0F27 Homo sapiens
Cellosaurus.tsv.gz:HUVEC/TERT2 CVCL_9Q53 Homo sapiens
3) Should MCF10As
refer to one of these:
Cellosaurus.tsv.gz:MCF10A CVCL_0598 Homo sapiens
Cellosaurus.tsv.gz:MCF10A CVCL_5555 Homo sapiens
Cellosaurus.tsv.gz:MCF10A-Er-Src CVCL_N805 Homo sapiens
Cellosaurus.tsv.gz:MCF10A-Myc CVCL_0411 Homo sapiens
Cellosaurus.tsv.gz:MCF10A-neo CVCL_6C54 Homo sapiens
Cellosaurus.tsv.gz:MCF10AMy CVCL_0411 Homo sapiens
Cellosaurus.tsv.gz:MCF10Ane CVCL_6C54 Homo sapiens
Cellosaurus.tsv.gz:MCF10Aneo CVCL_6C55 Homo sapiens
Cellosaurus.tsv.gz:MCF10AneoT CVCL_5554 Homo sapiens
4) Should MECs
refer to one of these:
Cellosaurus.tsv.gz:MEC CVCL_1870 Homo sapiens
Cellosaurus.tsv.gz:MEC CVCL_1871 Homo sapiens
Cellosaurus.tsv.gz:MEC CVCL_B270 Homo sapiens
Cellosaurus.tsv.gz:MEC- CVCL_F938 Mus musculus
Cellosaurus.tsv.gz:MEC- CVCL_F939 Mus musculus
Cellosaurus.tsv.gz:MEC- CVCL_F940 Mus musculus
Cellosaurus.tsv.gz:MEC- CVCL_F941 Mus musculus
5) Should MEFs
refer to one of these:
Cellosaurus.tsv.gz:MEF (C57BL/6) CVCL_9115 Mus musculus
Cellosaurus.tsv.gz:MEF (C57BL/6) IRR CVCL_9117 Mus musculus
Cellosaurus.tsv.gz:MEF (C57BL/6) MITC CVCL_9118 Mus musculus
Cellosaurus.tsv.gz:MEF (CF-1 CVCL_5251 Mus musculus
Cellosaurus.tsv.gz:MEF (CF-1) IR CVCL_K232 Mus musculus
Cellosaurus.tsv.gz:MEF (CF-1) MIT CVCL_K233 Mus musculus
Cellosaurus.tsv.gz:MEF (DR4 CVCL_5277 Mus musculus
Cellosaurus.tsv.gz:MEF (DR4) MIT CVCL_Y468 Mus musculus
Cellosaurus.tsv.gz:MEF PKCe KO CVCL_AS81 Mus musculus
Cellosaurus.tsv.gz:MEF PKCe KO KI CVCL_AS82 Mus musculus
Cellosaurus.tsv.gz:MEF Ulk1 -/- Ulk2 -/- (DKO) (SIM CVCL_5A56 Mus musculus
Cellosaurus.tsv.gz:MEF Ulk1 -/- Ulk2 -/- (DKO) (SV40 CVCL_5A57 Mus musculus
Cellosaurus.tsv.gz:MEF-1 [Human myeloma] CVCL_M515 Homo sapiens
Cellosaurus.tsv.gz:MEF-1 [Mouse fibroblast] CVCL_4240 Mus musculus
Cellosaurus.tsv.gz:MEF-BL/6- CVCL_9115 Mus musculus
Cellosaurus.tsv.gz:MEF1 CVCL_4240 Mus musculus
6) The entry le UA-CT-30007
conflicts with chemical in PubChem and ChEBI.
7) The entry tcp-1 UA-CLine-100117
conflicts with a Uniprot protein.
Sorry, I forgot to push the previous comment before I left for the meeting location crisis. 1 through 6 are not necessarily problems -- I just want Xia to verify that we're not creating duplicate IDs.
Rigth, Thanks!
El ago 25, 2016, a las 10:36 PM, Tom Hicks notifications@github.com escribió:
Sorry, I forgot to push the previous comment before I left for the meeting location crisis. 1 through 6 are not necessarily problems -- I just want Xia to verify that we're not creating duplicate IDs.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
@hickst I updated the atcc.tsv file with ~120 entries to be used as a CellLine dictionary
@hickst: can you please double check and merge?
Not merging yet: we're testing Processors and need to integrate this new KB into Reach and test before merging, so we'll be adding files and changes to this PR.
Added two groups of items to the NER manual list
Xia’s annotations: Some annotations that we need to have in order to use our context data set for training. These entries have been checked and they have no duplicate in any existing context KB file.
ATCC cell lines: I added these to the same file, although I feel they should be in their own stand alone file. The entries in this dictionary haven’t been diffed yet with Cellosaurus.
Let’s adjust this as necessary in the pull request.