floratos-lab / hipc-signature

HIPC Signature Project
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

subjects not found #34

Closed zhouji2013 closed 4 years ago

zhouji2013 commented 4 years ago

There are 143 cases of "subject not found" during the last data update.

This may be caused by out-of-date background data, e.g. the collection of gene symbols, or some discrepancy in the submission data. We need to find out the reasons.

hipciof_gene_31,response_agent,SLC49A4 hipciof_gene_29,response_agent,SHFL hipciof_gene_13,response_agent,SEPTIN6 hipciof_gene_12,response_agent,MICOS13 hipciof_gene_56,tissue_type,whole blood hipciof_gene_34,response_agent,NIBAN3 hipciof_gene_30,response_agent,SEPTIN7 hipciof_gene_34,response_agent,MMUT hipciof_gene_30,response_agent,SEPTIN6 hipciof_gene_61,response_agent,SEPTIN14 hipciof_gene_31,response_agent,DIPK2A hipciof_gene_30,response_agent,SEPTIN3 hipciof_gene_37,response_agent,MAP11 hipciof_gene_38,response_agent,SEPTIN3 hipciof_gene_38,response_agent,SEPTIN1 hipciof_gene_31,response_agent,ABITRAM hipciof_gene_3,response_agent,SNHG29 hipciof_gene_31,response_agent,BMERB1 hipciof_gene_38,response_agent,SEPTIN5 hipciof_gene_34,response_agent,NIBAN1 hipciof_gene_31,response_agent,SNHG29 hipciof_gene_55,tissue_type,blood plasma hipciof_gene_61,response_agent,PLAAT4 hipciof_gene_3,response_agent,NIBAN1 hipciof_gene_51,response_agent,GASK1B hipciof_ctf_29,tissue_type,whole blood hipciof_gene_34,response_agent,NUP42 hipciof_ctf_32,tissue_type,skin of body hipciof_gene_31,response_agent,FCSK hipciof_gene_34,response_agent,DIPK1A hipciof_gene_34,response_agent,LRATD2 hipciof_gene_51,target_pathogen,character(0) hipciof_gene_31,response_agent,SNHG32 hipciof_gene_37,response_agent,SEPTIN9 hipciof_gene_33,response_agent,DIPK1A hipciof_gene_31,response_agent,SHFL hipciof_gene_37,response_agent,SEPTIN7 hipciof_gene_49,response_agent,NIBAN3 hipciof_gene_30,response_agent,MAP11 hipciof_gene_31,response_agent,LRATD2 hipciof_gene_31,response_agent,MICOS10 hipciof_gene_7,response_agent,SHFL hipciof_gene_31,response_agent,MICOS13 hipciof_gene_7,response_agent,DIPK1A hipciof_gene_37,response_agent,RBIS hipciof_gene_38,response_agent,NIBAN2 hipciof_gene_30,response_agent,PLAAT5 hipciof_gene_34,response_agent,ATPSCKMT hipciof_gene_29,response_agent,GASK1B hipciof_gene_7,response_agent,SNHG29 hipciof_gene_34,response_agent,POGLUT2 hipciof_gene_12,response_agent,ABITRAM hipciof_gene_18,response_agent,SNHG32 hipciof_gene_37,response_agent,RSKR hipciof_gene_34,response_agent,MICOS10 hipciof_gene_64,tissue_type,whole blood hipciof_gene_7,response_agent,NIBAN3 hipciof_gene_60,tissue_type,whole blood hipciof_gene_34,response_agent,SEPTIN11 hipciof_ctf_32,tissue_type,whole blood hipciof_gene_24,response_agent,PLAAT4 hipciof_gene_44,tissue_type,whole blood hipciof_gene_7,response_agent,NIBAN1 hipciof_gene_13,response_agent,NIBAN1 hipciof_gene_18,response_agent,SNHG29 hipciof_gene_16,response_agent,SHFL hipciof_gene_34,response_agent,TASOR2 hipciof_gene_34,response_agent,CZIB hipciof_gene_38,response_agent,DIPK1B hipciof_gene_7,response_agent,SEPTIN1 hipciof_gene_37,response_agent,SEPTIN7P2 hipciof_gene_34,response_agent,SEPTIN9 hipciof_gene_7,response_agent,GASK1B hipciof_gene_34,response_agent,PLAAT3 hipciof_gene_29,response_agent,ATPSCKMT hipciof_gene_51,response_agent,SEPTIN4 hipciof_gene_34,response_agent,PLAAT4 hipciof_gene_34,response_agent,SEPTIN6 hipciof_gene_34,response_agent,SEPTIN2 hipciof_gene_18,response_agent,NIBAN1 hipciof_gene_16,response_agent,DIPK1A hipciof_gene_42,tissue_type,whole blood hipciof_gene_12,response_agent,ILRUN hipciof_gene_16,response_agent,TASOR2 hipciof_gene_37,response_agent,GASK1B hipciof_gene_38,response_agent,GASK1B hipciof_gene_31,response_agent,ILRUN hipciof_gene_31,response_agent,COA8 hipciof_gene_35,response_agent,SHFL hipciof_gene_33,response_agent,PLAAT2 hipciof_gene_58,tissue_type,whole blood hipciof_gene_37,response_agent,DIPK2A hipciof_gene_25,response_agent,POGLUT2 hipciof_gene_7,response_agent,DIPK2A hipciof_gene_20,response_agent,PLAAT4 hipciof_gene_48,target_pathogen,character(0) hipciof_ctf_36,tissue_type,whole blood hipciof_gene_34,response_agent,GASK1B hipciof_gene_6,response_agent,PLAAT4 hipciof_gene_54,tissue_type,skin of body hipciof_gene_18,response_agent,SEPTIN4 hipciof_gene_31,response_agent,DOCK8-AS1 hipciof_gene_53,tissue_type,whole blood hipciof_gene_29,response_agent,MROCKI hipciof_gene_61,response_agent,BMERB1 hipciof_gene_31,response_agent,RBIS hipciof_gene_7,response_agent,PLAAT4 hipciof_gene_43,tissue_type,whole blood hipciof_gene_38,response_agent,PLAAT2 hipciof_gene_57,response_agent,BABAM2-AS1 hipciof_gene_30,response_agent,NIBAN3 hipciof_gene_37,response_agent,DIPK1A hipciof_gene_30,response_agent,NIBAN1 hipciof_gene_52,tissue_type,whole blood hipciof_gene_31,response_agent,POGLUT3 hipciof_gene_7,response_agent,SEPTIN11 hipciof_gene_37,response_agent,SHFL hipciof_gene_46,tissue_type,whole blood hipciof_ctf_28,tissue_type,serum hipciof_gene_7,response_agent,ATPSCKMT hipciof_gene_39,response_agent,NIBAN3 hipciof_gene_29,response_agent,POGLUT3 hipciof_gene_24,response_agent,SEPTIN4 hipciof_gene_63,response_agent,OBI1 hipciof_gene_13,response_agent,MICOS13 hipciof_gene_20,response_agent,NIBAN2 hipciof_gene_3,response_agent,SHFL hipciof_gene_29,response_agent,STIMATE-MUSTN1 hipciof_gene_61,response_agent,NIBAN2 hipciof_gene_16,response_agent,RBIS hipciof_gene_61,target_pathogen,character(0) hipciof_gene_34,response_agent,FAM153CP hipciof_gene_30,response_agent,LRATD1 hipciof_gene_31,response_agent,GASK1B hipciof_gene_61,response_agent,SEPTIN4 hipciof_ctf_30,tissue_type,whole blood hipciof_gene_37,response_agent,NIBAN2 hipciof_gene_47,tissue_type,whole blood hipciof_gene_3,response_agent,SEPTIN4 hipciof_gene_38,response_agent,LINC02631 hipciof_ctf_35,tissue_type,whole blood hipciof_gene_31,response_agent,PLAAT4 hipciof_gene_62,tissue_type,whole blood

kcs3 commented 4 years ago

Updating the gene info file is described in #35. We need to further investigate the cell type failures.

zhouji2013 commented 4 years ago

Using the new gene data (see #35), there were only 26 not-found cases:

hipciof_gene_52,tissue_type,whole blood hipciof_gene_54,tissue_type,blood plasma hipciof_gene_46,tissue_type,whole blood hipciof_gene_42,tissue_type,whole blood hipciof_ctf_28,tissue_type,serum hipciof_gene_55,tissue_type,whole blood hipciof_gene_37,response_agent,CRIPAK hipciof_gene_51,tissue_type,whole blood hipciof_gene_53,tissue_type,skin of body hipciof_gene_45,tissue_type,whole blood hipciof_gene_60,target_pathogen,character(0) hipciof_gene_47,target_pathogen,character(0) hipciof_gene_59,tissue_type,whole blood hipciof_gene_34,response_agent,CRIPAK hipciof_gene_50,target_pathogen,character(0) hipciof_ctf_29,tissue_type,whole blood hipciof_ctf_32,tissue_type,whole blood hipciof_ctf_32,tissue_type,skin of body hipciof_ctf_36,tissue_type,whole blood hipciof_gene_34,response_agent,LOC649305 hipciof_gene_63,tissue_type,whole blood hipciof_gene_61,tissue_type,whole blood hipciof_gene_57,tissue_type,whole blood hipciof_ctf_35,tissue_type,whole blood hipciof_gene_38,response_agent,SPHAR hipciof_gene_43,tissue_type,whole blood

kcs3 commented 4 years ago

The missing tissue types above are not found directly in our tissue reference ontology, which is the Cell Ontology. Cell Ontology may reference them in some way, we need to check how the example was handled for the term "blood", which also maps to UBERON_0000178.

The 4 unmatched tissue types are: whole blood UBERON_0000178 blood plasma UBERON_0001969 skin of body UBERON_0002097 serum BTO_0001239

kcs3 commented 4 years ago

In the above list, there are three genes that do not match our gene background data. They are CRIPAK LOC649305 SPAR All three genes are marked as discontinued by NCBI/HGNC. They will presumably be removed by our R script gene name correction pipeline when its own underlying data is updated to reflect these withdrawn symbols.

No action required on these genes.

Note - all three gene symbols are still in our latest run of the R script, upload version 18. So the underlying R packages are lagging behind the ontology we are using. On the other hand, since the Dashboard loader just omits these rows, there is no pressing problem.

kcs3 commented 4 years ago

The missing pathogens are a problem in the R processing script, not the Dashboard.

zhouji2013 commented 4 years ago

The missing tissue types above are not found directly in our tissue reference ontology, which is the Cell Ontology. Cell Ontology may reference them in some way, we need to check how the example was handled for the term "blood", which also maps to UBERON_0000178.

The 4 unmatched tissue types are: whole blood UBERON_0000178 blood plasma UBERON_0001969 skin of body UBERON_0002097 serum BTO_0001239

for reference, see issue #10

zhouji2013 commented 4 years ago

After adding the four cell subsets to the background data as Ken suggested (though serum added as UBERON_0001997 according to http://www.ontobee.org), the 'not found' subjects are now 18:

hipciof_ctf_35,target_pathogen,character(0) hipciof_gene_59,target_pathogen,character(0) hipciof_gene_34,response_agent,CRIPAK hipciof_gene_50,target_pathogen,character(0) hipciof_gene_43,target_pathogen,Leishmania hipciof_gene_34,response_agent,LOC649305 hipciof_gene_46,target_pathogen,character(0) hipciof_gene_51,target_pathogen,character(0) hipciof_gene_37,response_agent,CRIPAK hipciof_gene_43,response_agent,BACH1-IT1 hipciof_ctf_29,target_pathogen,Leishmania hipciof_gene_63,target_pathogen,character(0) hipciof_ctf_36,target_pathogen,character(0) hipciof_gene_42,target_pathogen,character(0) hipciof_gene_43,response_agent,MT-TS1 hipciof_gene_60,target_pathogen,character(0) hipciof_gene_47,target_pathogen,character(0) hipciof_gene_38,response_agent,SPHAR

zhouji2013 commented 4 years ago

After the latest data update, there are only 6 cases now:

hipciof_gene_34,response_agent,CRIPAK hipciof_gene_37,response_agent,CRIPAK hipciof_gene_43,response_agent,BACH1-IT1 hipciof_gene_43,response_agent,MT-TS1 hipciof_gene_34,response_agent,LOC649305 hipciof_gene_38,response_agent,SPHAR

zhouji2013 commented 4 years ago

Using the latest submission, another vaccine case showed up:

hipciof_ctf_29,exposure_material,VO_000396

kcs3 commented 4 years ago

Copied incorrectly into template, it should have been VO_0003961. Fixed in original Google sheet now, will appear next time we do a new upload.

kcs3 commented 4 years ago

Of the genes above: CRIPAK: entry withdrawn (HGNC). No NCBI entry. BACH1-IT1: no NCBI entry, it is an intronic transcript. It is in HGNC and Ensembl. MT-TS1 - NCBI symbol is TRANS1 LOC649305 - withdrawn by NCBI SPHAR - withdrawn by NCBI, HGNC

These are all issues that can be dealt with in the R data cleaning script; no issue for Dashboard code itself.

kcs3 commented 4 years ago

There are no further outstanding items for this issue. Closing issue.