KTH-Library / kthcorpus

R package to support workflows related to the corpus of publications from KTH
https://kth-library.github.io/kthcorpus
GNU Affero General Public License v3.0
0 stars 1 forks source link

Column `Position` doesn't exist. when using function kth_diva_authors #126

Open mskyttner opened 1 year ago

mskyttner commented 1 year ago

Might possibly be the consequence of a change in schema from DiVA? Needs investigation.

This issue affects jobs making use of the function (quality control report upload, orcid dataset upload).

mskyttner commented 1 year ago

Doesn't seem to be a schema change, rather a timeout from the feed. Support request sent to DiVA-portal.

To replicate:

curl -Z --globoff  -L \
-o ~/.config/kthcorpus/persons_2010.csv 'https://kth.diva-portal.org/smash/export.jsf?format=csv&addFilename=true&aq=[[]]&aqe=[]&aq2=[[{"dateIssued":{"from":2010,"to":2010}},{"publicationTypeCode":["bookReview","review","article","artisticOutput","book","chapter","manuscript","collection","other","conferencePaper","patent","conferenceProceedings","report","dataset"]}]]&onlyFullText=false&noOfRows=5000000&sortOrder=title_sort_asc&sortOrder2=title_sort_asc&csvType=person&fl=PID,AuthorityPid,DOI,FirstName,ISI,ISRN,LastName,LocalId,NBN,ORCID,OrganisationId,UncontrolledOrganisation,Position,PMID,ResearchGroup,Role,ScopusId'
mskyttner commented 1 year ago

This server timeout issue has been on and off the last week, currently on. I have contacted DiVA on the email recommended from the server error page.

mskyttner commented 11 months ago

This issue was temporarily resolved by DiVA support, and things worked smoothly for more than a month.

Now it resurfaced.

Error messages like these occur in our logs:

 ▆
prod-kthcorpusapi-1  |   1. ├─kthcorpus::kth_diva_authors(refresh_cache = TRUE)
prod-kthcorpusapi-1  |   2. │ └─... %>% mutate(uses_etal = NA, is_extorg = !is.na(extorg))
prod-kthcorpusapi-1  |   3. ├─dplyr::mutate(., uses_etal = NA, is_extorg = !is.na(extorg))
prod-kthcorpusapi-1  |   4. ├─dplyr::select(...)
prod-kthcorpusapi-1  |   5. ├─dplyr::select(., -c(FirstName, LastName))
prod-kthcorpusapi-1  |   6. ├─dplyr::rename(...)
prod-kthcorpusapi-1  |   7. ├─dplyr:::rename.data.frame(...)
prod-kthcorpusapi-1  |   8. │ └─tidyselect::eval_rename(expr(c(...)), .data)
prod-kthcorpusapi-1  |   9. │   └─tidyselect:::rename_impl(...)
prod-kthcorpusapi-1  |  10. │     └─tidyselect:::eval_select_impl(...)
prod-kthcorpusapi-1  |  11. │       ├─tidyselect:::with_subscript_errors(...)
prod-kthcorpusapi-1  |  12. │       │ └─rlang::try_fetch(...)
prod-kthcorpusapi-1  |  13. │       │   └─base::withCallingHandlers(...)
prod-kthcorpusapi-1  |  14. │       └─tidyselect:::vars_select_eval(...)
prod-kthcorpusapi-1  |  15. │         └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
prod-kthcorpusapi-1  |  16. │           └─tidyselect:::eval_c(expr, data_mask, context_mask)
prod-kthcorpusapi-1  |  17. │             └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
prod-kthcorpusapi-1  |  18. │               └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
prod-kthcorpusapi-1  |  19. │                 └─tidyselect:::as_indices_sel_impl(...)
prod-kthcorpusapi-1  |  20. │                   └─tidyselect:::as_indices_impl(...)
prod-kthcorpusapi-1  |  21. │                     └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
prod-kthcorpusapi-1  |  22. │                       └─vctrs::vec_as_location(...)
prod-kthcorpusapi-1  |  23. └─vctrs (local) `<fn>`()
prod-kthcorpusapi-1  |  24.   └─vctrs:::stop_subscript_oob(...)
prod-kthcorpusapi-1  |  25.     └─vctrs:::stop_subscript(...)
prod-kthcorpusapi-1  |  26.       └─rlang::abort(...)
prod-kthcorpusapi-1  | Warning message:
prod-kthcorpusapi-1  | The following named parsers don't match the column names: PID, Position, PMID 

This time the requests don't fail with status 500 but instead return an empty respone (no data).

mskyttner commented 8 months ago

There seems to be a similar issue ongoing now, the following information is in the logs:

The following parsing issues are present in DiVA authors: # A tibble: 788 × 5
     row   col expected  actual    file                                  
   <int> <int> <chr>     <chr>     <chr>                                 
 1    10     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 2    25     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 3    40     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 4    55     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 5    70     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 6    85     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 7   100     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 8   115     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
 9   130     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
10   145     2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
# ℹ 778 more rows
Error in `rename()`:
! Can't rename columns that don't exist.
✖ Column `Position` doesn't exist.
Backtrace:
     ▆
  1. ├─kthcorpus::diva_refresh_trigger()
  2. │ └─kthcorpus::kth_diva_authors(refresh_cache = TRUE)
  3. │   └─... %>% mutate(uses_etal = NA, is_extorg = !is.na(extorg))
  4. ├─dplyr::mutate(., uses_etal = NA, is_extorg = !is.na(extorg))
  5. ├─dplyr::select(...)
  6. ├─dplyr::select(., -c(FirstName, LastName))
  7. ├─dplyr::rename(...)
  8. ├─dplyr:::rename.data.frame(...)
  9. │ └─tidyselect::eval_rename(expr(c(...)), .data)
 10. │   └─tidyselect:::rename_impl(...)
 11. │     └─tidyselect:::eval_select_impl(...)
 12. │       ├─tidyselect:::with_subscript_errors(...)
 13. │       │ └─rlang::try_fetch(...)
 14. │       │   └─base::withCallingHandlers(...)
 15. │       └─tidyselect:::vars_select_eval(...)
 16. │         └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
 17. │           └─tidyselect:::eval_c(expr, data_mask, context_mask)
 18. │             └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 19. │               └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
 20. │                 └─tidyselect:::as_indices_sel_impl(...)
 21. │                   └─tidyselect:::as_indices_impl(...)
 22. │                     └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
 23. │                       └─vctrs::vec_as_location(...)
 24. └─vctrs (local) `<fn>`()
 25.   └─vctrs:::stop_subscript_oob(...)
 26.     └─vctrs:::stop_subscript(...)
 27.       └─rlang::abort(...)
Warning messages:
1: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat) 
2: In diva_refresh() : Not all files refreshed...
3: The following named parsers don't match the column names: PID, Position, PMID 
4: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat) 
Execution halted