KechrisLab / multiMiR

Development repository for the multiMiR database's R API
Other
20 stars 3 forks source link

'argument "code" is missing' error while using 2.2.0 database #21

Closed rjaksik closed 7 years ago

rjaksik commented 7 years ago

Analysis using db 2.2.0 fails on miranda results processing with the following error :

Error in which(value == defs) : argument "code" is missing, with no default

2.1.0 runs fine

smahaffey commented 7 years ago

@rjaksik thank you. I have replicated the problem and am looking into it.

@mmulvahill the miranda DB is exactly the same as it was in 2.1. I just dumped it and reloaded because targetscan and then the related target/mirna tables were the only tables updated this time. I haven't looked at the code much since you cleaned it up. Any idea what that might be?

It occurs after searching miranda and an additional warning message "XML content does not seem to be XML: ''". So I can't tell from that if it isn't handling an empty result correctly or if the empty result is not being formatted on the server correctly. Or maybe it's unrelated.

smahaffey commented 7 years ago

@mmulvahill There seem to be two issues. The first is that it's constructing invalid SQL with the following clause in the miranda query: AND (i.mirsvr_score <= ) from the full query generated for testing: SELECT m.mature_mirna_acc, m.mature_mirna_id, t.target_symbol, t.target_entrez, t.target_ensembl, i.mirsvr_score AS score FROM mirna AS m INNER JOIN miranda AS i INNER JOIN target AS t ON (m.mature_mirna_uid = i.mature_mirna_uid AND i.target_uid = t.target_uid) WHERE (t.target_symbol IN ('ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419') OR t.target_entrez IN ('ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419') OR t.target_ensembl IN ('ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419')) AND (i.conservation >= 0.57) AND (i.mirsvr_score <= ) AND (m.org = 'hsa' AND t.org = 'hsa') ORDER BY i.mirsvr_score DESC

I am having trouble following the code to generate queries so maybe you can tell why it isn't either filling a default in or leaving the statement out. I checked my generated cutoffs file and it has the corresponding values for miranda.hsa,... etc. I also tried with a specified cutoff and it fails. However it all seems to work with v2.1.0. The cutoff file is different for each DB version is that somehow not being used in the miranda query?

Then secondly maybe on the server I should catch SQL errors and return error messages to the client instead of as it is right now returning nothing.

mmulvahill commented 7 years ago

Can you give me the R function your using to reproduce this error?

smahaffey commented 7 years ago

get.multimir(org='hsa', target=c("ENSG00000000003","ENSG00000000005","ENSG00000000419"),table='predicted',summary=TRUE, predicted.cutoff = 30, predicted.cutoff.type = 'p') or get.multimir(org='hsa', target=c("ENSG00000000003","ENSG00000000005","ENSG00000000419"),table='all',summary=TRUE)

It doesn't really matter what the genes are as long as the table includes miranda so 'all' or 'predicted', but not the default which is 'validated' also wasn't affected by specifying a cutoff.

mmulvahill commented 7 years ago

@smahaffey

Since AND (i.mirsvr_score <= ) is part of a WHERE clause, specifically the cutoff where clauses -- I looked into the function where_cutoff(). The score_cutoff argument to where_cutoff() isn't being resolved correctly in DB 2.2. The cutoffs table returned by get.multimir.cutoffs() is different than in DB version 2.1.

Version 2.1 of the DB had separate conserved and non-conserved versions of the cutoffs for several tables -- i.e. miranda.hsa.c1 and miranda.hsa.c0, while version 2.2 doesn't. Should version 2.2 have these tables? Or should I update the package to only query the tables without the .c# suffix (i.e. miranda.hsa)?

smahaffey commented 7 years ago

It should have them and in the exact same format. I will try generating them again. It's loaded from 3 different .rda files. I can't find where that's occurring, but maybe something happened when they were created the first time.

mmulvahill commented 7 years ago

This isn't working now (I'm guessing you took the 2.2.rda file down), but here is the code I used to identify the issue.


library(multiMiR)
library(tidyverse)

# Using current version (2.1)
options("multimir.cutoffs" = "multimir_cutoffs_2.1.rda")
multiMiR:::get.multimir.cutoffs() %>% str

options("multimir.cutoffs" = "multimir_cutoffs_2.2.rda")
multiMiR:::get.multimir.cutoffs() %>% str
smahaffey commented 7 years ago

Sorry. Yes I removed them until I generated the new ones. It takes a long time to generate them. Thanks. I will try it with the new version and with that I'll try comparing it to the old version and see if I can figure out what went wrong. I did find two different version of code to generate them so I may have used the wrong one. We should create either a public or private repository for the db/server side code.

smahaffey commented 7 years ago

New files worked. The issue seems to be fixed. Thank you @mmulvahill sorry for the trouble tracking down the problem.

mmulvahill commented 7 years ago

@smahaffey No worries -- it's easier when you were the one who rewrote the code ;)

I do think putting the db/server side code into a repo is a good idea -- I'd be interested in seeing it too since it's outside of my typical work/area of expertise, but so tightly a part of this pkg.

rjaksik commented 7 years ago

Thank you for resolving the issue and your work on the package, It is extremely useful.

I also want you to know that in microcosm database some genes appear under UniProtKB ID e.g. "PRY_HUMAN" instead of HUGO symbol "PRY" (EntrezID 9081) and because of that they are not merged with other databases in the summary table.

smahaffey commented 7 years ago

Thank you @rjaksik . We are aware of several instances of this issue where we couldn't match the IDs provided by the source database across all three main IDs(Ensembl, Entrez, and Gene Symbol). It is related to the varied sources and versions of IDs used by each of the database sources. We have tried to resolve as many as possible by using the biomaRt package to match IDs, without manually cleaning up annotation. We hope to be able to issue a database update in the future with as many of these resolved as possible. Thank you for the detailed explanation. We can resolve that specific ID issue automatically and we will try to include that in the next database update.