bigomics / playbase

Core back-end functionality and logic for OmicsPlayground
Other
4 stars 0 forks source link

Missing annotation for 1.5% of mouse UNIPROT #149

Closed ivokwee closed 2 months ago

ivokwee commented 2 months ago

From CG: "The issue with the missing gene symbols is that we would like to have them in the heat map in a pubication (see sceenshot). Seventy three of the 74 missing gene symbols I was able to retrieve fom UniProt. Is there any way to get them onto the Heat map, or any other Omics Playground illustration or chart?"

Confirmed on mouse UNIPROT data. Missing annotation e.g. for probes "Q60870" "P09541" "Q61410" "Q80YV3"

ivokwee commented 2 months ago

Apparently the error is that UNIPROT annotation seems less covered that ACCNUM annotation. Using UNIPROT we get 98.52% annotated, using ACCNUM 99.879%. The probe type of the dataset was detected as UNIPROT where ACCNUM would have been better.

The solution is to increase the match ratio threshold (or to disable?). Also putting ACCNUM earlier than UNIPROT will help.

missing.probes <- c("Q60870","P09541","Q61410","Q80YV3")
getGeneAnnotation.ANNOTHUB("Mouse", missing.probes, probe_type = "ACCNUM")
getGeneAnnotation.ANNOTHUB("Mouse", missing.probes, probe_type = "UNIPROT")