aberHRML / classyfireR

R Interface to the ClassyFire REST API
https://aberhrml.github.io/classyfireR
10 stars 9 forks source link

InChIKeys give errors #28

Closed meier-rene closed 4 years ago

meier-rene commented 4 years ago

I tried to batch process a bigger number of InChI-keys and found some which give errors

bad_key <- 
c('QEVGZEDELICMKH-UHFFFAOYSA-N',
'SYLAFCZSYRXBJF-UHFFFAOYSA-N',
'BOPPPUCSDSHZEZ-UHFFFAOYSA-N')

> get_classification(bad_key[1])
✔ QEVGZEDELICMKH-UHFFFAOYSA-N
Error: Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Call `rlang::last_error()` to see a backtrace. 
> get_classification(bad_key[2])
✔ SYLAFCZSYRXBJF-UHFFFAOYSA-N
Error: Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Call `rlang::last_error()` to see a backtrace. 
> get_classification(bad_key[3])
✔ BOPPPUCSDSHZEZ-UHFFFAOYSA-N
Error: Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Call `rlang::last_error()` to see a backtrace.

Using web browser is working fine: http://classyfire.wishartlab.com/entities/QEVGZEDELICMKH-UHFFFAOYSA-N http://classyfire.wishartlab.com/entities/SYLAFCZSYRXBJF-UHFFFAOYSA-N http://classyfire.wishartlab.com/entities/BOPPPUCSDSHZEZ-UHFFFAOYSA-N

Could you please have a look?

meier-rene commented 4 years ago

and here comes the stacktrace as suggested by @sneumann

rlang::last_trace()
<error/rlang_error>
Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Backtrace:
     █
  1. └─base::sapply(inchikeys, get_classification)
  2.   └─base::lapply(X = X, FUN = FUN, ...)
  3.     └─classyfireR:::FUN(X[[i]], ...)
  4.       └─classyfireR:::parse_external_desc(json_res)
  5.         └─tibble::tibble(...)
  6.           ├─tibble::as_tibble(lst_quos(xs, expand = TRUE))
  7.           └─tibble:::as_tibble.list(lst_quos(xs, expand = TRUE))
  8.             └─tibble:::list_to_tibble(x, validate)
  9.               └─tibble:::check_tibble(x)
 10.                 └─tibble:::invalid_df(...)
 11.                   └─tibble:::stopc(...)
meier-rene commented 4 years ago

Working InChI-keys are for example:

good_key <- 
c('JIVPVXMEBJLZRO-UHFFFAOYSA-N',
'ZZUFCTLCJUWOSV-UHFFFAOYSA-N',
'QZTKDVCDBIDYMD-UHFFFAOYSA-N')
sneumann commented 4 years ago

This is the pure JSON: QEVGZEDELICMKH-UHFFFAOYSA-N.json.txt returned for one of them. The error is thrown in https://github.com/aberHRML/classyfireR/blob/bd6a217ebdf54124158665cb482740bc5655a723/R/internals.R#L74

I am on a current snapshot of R-devel, and get a slightly different error message from certainly the same underlying issue. Checking the JSON I see that there are no source, source_id nor annotations.

> get_classification(bad_key[1])
✔ QEVGZEDELICMKH-UHFFFAOYSA-N
Error: All columns in a tibble must be 1d or 2d objects:
* Column `source` is NULL
* Column `source_id` is NULL
* Column `annotations` is NULL
Call `rlang::last_error()` to see a backtrace

Doing things manually

response <- httr::GET("http://classyfire.wishartlab.com/entities/QEVGZEDELICMKH-UHFFFAOYSA-N.json")
text_content <- httr::content(response, 'text')
json_res <- jsonlite::fromJSON(text_content)
classification <- classyfireR:::parse_json_output(json_res)

I get

> classification
# A tibble: 4 x 3
  Level      Classification                     CHEMONT          
  <chr>      <chr>                              <chr>            
1 kingdom    Organic compounds                  CHEMONTID:0000000
2 superclass Organic acids and derivatives      CHEMONTID:0000264
3 class      Carboxylic acids and derivatives   CHEMONTID:0000265
4 subclass   Dicarboxylic acids and derivatives CHEMONTID:0000346

Checking one of the working InChIkeys: http://classyfire.wishartlab.com/entities/JIVPVXMEBJLZRO-UHFFFAOYSA-N.json they do have

...
"external_descriptors":[
{"source":"CHEBI",
"source_id":"CHEBI:3654",
"annotations":["sulfonamide","monochlorobenzenes","isoindoles"]
}
]
...

while the bad keys have "external_descriptors":[]. So, in https://github.com/aberHRML/classyfireR/blob/bd6a217ebdf54124158665cb482740bc5655a723/R/get_classification.R#L71 we need a check for length(json_res$external_descriptors)>0

Yours, Steffen

wilsontom commented 4 years ago

Yeah, this is caused when there are no external descriptors are present. I will add a length check in and push a new version to the devel branch

Tom

wilsontom commented 4 years ago

This is fixed now on the devel branch, if you install using;

remotes::install_github('aberHRML/classyfireR', ref = 'devel')

I will add length checks for all the other components, in-case there are further InChIKeys missing elements of the json output. Should be able to get the fixed version onto CRAN by Monday.

Thanks

Tom

meier-rene commented 4 years ago

Thanks for the fast fix. Its working now.