Open mskyttner opened 1 year ago
Doesn't seem to be a schema change, rather a timeout from the feed. Support request sent to DiVA-portal.
To replicate:
curl -Z --globoff -L \
-o ~/.config/kthcorpus/persons_2010.csv 'https://kth.diva-portal.org/smash/export.jsf?format=csv&addFilename=true&aq=[[]]&aqe=[]&aq2=[[{"dateIssued":{"from":2010,"to":2010}},{"publicationTypeCode":["bookReview","review","article","artisticOutput","book","chapter","manuscript","collection","other","conferencePaper","patent","conferenceProceedings","report","dataset"]}]]&onlyFullText=false&noOfRows=5000000&sortOrder=title_sort_asc&sortOrder2=title_sort_asc&csvType=person&fl=PID,AuthorityPid,DOI,FirstName,ISI,ISRN,LastName,LocalId,NBN,ORCID,OrganisationId,UncontrolledOrganisation,Position,PMID,ResearchGroup,Role,ScopusId'
This server timeout issue has been on and off the last week, currently on. I have contacted DiVA on the email recommended from the server error page.
This issue was temporarily resolved by DiVA support, and things worked smoothly for more than a month.
Now it resurfaced.
Error messages like these occur in our logs:
▆
prod-kthcorpusapi-1 | 1. ├─kthcorpus::kth_diva_authors(refresh_cache = TRUE)
prod-kthcorpusapi-1 | 2. │ └─... %>% mutate(uses_etal = NA, is_extorg = !is.na(extorg))
prod-kthcorpusapi-1 | 3. ├─dplyr::mutate(., uses_etal = NA, is_extorg = !is.na(extorg))
prod-kthcorpusapi-1 | 4. ├─dplyr::select(...)
prod-kthcorpusapi-1 | 5. ├─dplyr::select(., -c(FirstName, LastName))
prod-kthcorpusapi-1 | 6. ├─dplyr::rename(...)
prod-kthcorpusapi-1 | 7. ├─dplyr:::rename.data.frame(...)
prod-kthcorpusapi-1 | 8. │ └─tidyselect::eval_rename(expr(c(...)), .data)
prod-kthcorpusapi-1 | 9. │ └─tidyselect:::rename_impl(...)
prod-kthcorpusapi-1 | 10. │ └─tidyselect:::eval_select_impl(...)
prod-kthcorpusapi-1 | 11. │ ├─tidyselect:::with_subscript_errors(...)
prod-kthcorpusapi-1 | 12. │ │ └─rlang::try_fetch(...)
prod-kthcorpusapi-1 | 13. │ │ └─base::withCallingHandlers(...)
prod-kthcorpusapi-1 | 14. │ └─tidyselect:::vars_select_eval(...)
prod-kthcorpusapi-1 | 15. │ └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
prod-kthcorpusapi-1 | 16. │ └─tidyselect:::eval_c(expr, data_mask, context_mask)
prod-kthcorpusapi-1 | 17. │ └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
prod-kthcorpusapi-1 | 18. │ └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
prod-kthcorpusapi-1 | 19. │ └─tidyselect:::as_indices_sel_impl(...)
prod-kthcorpusapi-1 | 20. │ └─tidyselect:::as_indices_impl(...)
prod-kthcorpusapi-1 | 21. │ └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
prod-kthcorpusapi-1 | 22. │ └─vctrs::vec_as_location(...)
prod-kthcorpusapi-1 | 23. └─vctrs (local) `<fn>`()
prod-kthcorpusapi-1 | 24. └─vctrs:::stop_subscript_oob(...)
prod-kthcorpusapi-1 | 25. └─vctrs:::stop_subscript(...)
prod-kthcorpusapi-1 | 26. └─rlang::abort(...)
prod-kthcorpusapi-1 | Warning message:
prod-kthcorpusapi-1 | The following named parsers don't match the column names: PID, Position, PMID
This time the requests don't fail with status 500 but instead return an empty respone (no data).
There seems to be a similar issue ongoing now, the following information is in the logs:
The following parsing issues are present in DiVA authors: # A tibble: 788 × 5
row col expected actual file
<int> <int> <chr> <chr> <chr>
1 10 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
2 25 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
3 40 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
4 55 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
5 70 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
6 85 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
7 100 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
8 115 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
9 130 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
10 145 2 1 columns 2 columns /github/home/.config/kthcorpus/aut.csv
# ℹ 778 more rows
Error in `rename()`:
! Can't rename columns that don't exist.
✖ Column `Position` doesn't exist.
Backtrace:
▆
1. ├─kthcorpus::diva_refresh_trigger()
2. │ └─kthcorpus::kth_diva_authors(refresh_cache = TRUE)
3. │ └─... %>% mutate(uses_etal = NA, is_extorg = !is.na(extorg))
4. ├─dplyr::mutate(., uses_etal = NA, is_extorg = !is.na(extorg))
5. ├─dplyr::select(...)
6. ├─dplyr::select(., -c(FirstName, LastName))
7. ├─dplyr::rename(...)
8. ├─dplyr:::rename.data.frame(...)
9. │ └─tidyselect::eval_rename(expr(c(...)), .data)
10. │ └─tidyselect:::rename_impl(...)
11. │ └─tidyselect:::eval_select_impl(...)
12. │ ├─tidyselect:::with_subscript_errors(...)
13. │ │ └─rlang::try_fetch(...)
14. │ │ └─base::withCallingHandlers(...)
15. │ └─tidyselect:::vars_select_eval(...)
16. │ └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
17. │ └─tidyselect:::eval_c(expr, data_mask, context_mask)
18. │ └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
19. │ └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
20. │ └─tidyselect:::as_indices_sel_impl(...)
21. │ └─tidyselect:::as_indices_impl(...)
22. │ └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
23. │ └─vctrs::vec_as_location(...)
24. └─vctrs (local) `<fn>`()
25. └─vctrs:::stop_subscript_oob(...)
26. └─vctrs:::stop_subscript(...)
27. └─rlang::abort(...)
Warning messages:
1: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
2: In diva_refresh() : Not all files refreshed...
3: The following named parsers don't match the column names: PID, Position, PMID
4: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Execution halted
Might possibly be the consequence of a change in schema from DiVA? Needs investigation.
This issue affects jobs making use of the function (quality control report upload, orcid dataset upload).