andreaspacher / openeditors

Webscraping data about editors of scientific journals.
https://openeditors.ooir.org/
Creative Commons Zero v1.0 Universal
54 stars 11 forks source link

match editor affiliations to ROR IDs (first pass) #5

Closed bmkramer closed 3 years ago

bmkramer commented 3 years ago

This PR matches editor affiliations to ROR IDs.

The ROR API provides affiliation matching, which returns both a matching confidence score ('score'), with values between 0 and 1, and a binary indicator ('chosen') of whether the score is high enough to consider the organization correctly matched.

For this script, only ROR IDs are included for which 'chosen' = 1, resulting in a match for 82% of editors. Further improvements could be made by evaluating ROR IDs for which 'chosen' = 0 but with high values for 'score'.

Included in this PR: Script/add-ror.R Data/affiliations_ror.csv (all unique affiliations with matching ROR-ID (or NA)) Output/editors1_ror.csv Output/editors2_ror.csv

NB The script contains intermediate steps to fix encoding issues, using the same code used in Script/clean-final-data.R As an added cleaning step, html-tags were also removed (for ROR matching only, not suggesting to include for the final data!)

andreaspacher commented 3 years ago

This is amazing, thank you so much for this!

For some reason, I get an error at line 183 ror <- map_dfr(aff, getROR_progress).

I tried it out with a reduced aff as in aff <- head(aff, 15) beforehand just to test the code.

This is the error I get:

> ror <- map_dfr(aff, getROR_progress)
|===========================================================|100% ~0 s remaining     Error: Internal error in `vec_assign()`: `value` should have been recycled to fit `x`.
Run `rlang::last_error()` to see where the error occurred.

> rlang::last_error()
<error/rlang_error>
Internal error in `vec_assign()`: `value` should have been recycled to fit `x`.
Backtrace:
 1. purrr::map_dfr(aff, getROR_progress)
 2. dplyr::bind_rows(res, .id = .id)
 3. vctrs::vec_rbind(!!!dots, .names_to = .id)
Run `rlang::last_trace()` to see the full context.

<error/rlang_error>
Internal error in `vec_assign()`: `value` should have been recycled to fit `x`.
Backtrace:
    x
 1. \-purrr::map_dfr(aff, getROR_progress)
 2.   \-dplyr::bind_rows(res, .id = .id)
 3.     \-vctrs::vec_rbind(!!!dots, .names_to = .id)
 4.       \-(function () ...

This is then the consequence:

> ror
Error: object 'ror' not found
bmkramer commented 3 years ago

Ah, good catch, thanks! Turns out I had left a select() command (=select column) that should have been pull() (=pull out vector) (in line 174)

I've fixed this in 7906d42646971ac2999f0cca88a3d8507357259e, together with 2 other small edits.