Open bschilder opened 1 year ago
Actually, it may be more related to this part:
Error: Error: processing vignette 'MungeSumstats.Rmd' failed with diagnostics:
No such file or directory: '/var/folders/3s/vfzpb5r51gs6y328rmm7c0000gn/T//RtmpK6ONEL/file2c69376a9483eduAttainOkbay_standardised.tsv'. Unable to create new file for writing (it does not exist already). Do you have permission to write here, is there space on the disk and does the path exist?
Which corresponds to this part of the MSS vignette:
eduAttainOkbayPth <- system.file("extdata", "eduAttainOkbay.txt",
package = "MungeSumstats")
formatted_path <- tempfile(fileext = "eduAttainOkbay_standardised.tsv.gz")
#### 1. Read in the data and standardise header names ####
dat <- MungeSumstats::read_sumstats(path = eduAttainOkbayPth,
standardise_headers = TRUE)
knitr::kable(head(dat))
#### 2. Write to disk as a compressed, tab-delimited, tabix-indexed file ####
formatted_path <- MungeSumstats::write_sumstats(sumstats_dt = dat,
save_path = formatted_path,
tabix_index = TRUE,
write_vcf = FALSE,
return_path = TRUE)
Running this step locally i get some messages:
Sorting coordinates with 'data.table'.
Writing in tabular format ==> /var/folders/rd/rbc_wrdj4k3djf3brk6z0_dc0000gp/T//Rtmp2mTww8/filee685477cbcdeduAttainOkbay_standardised.tsv
Writing uncompressed instead of gzipped to enable tabix indexing.
Converting full summary stats file to tabix format for fast querying...
Reading header.
Ensuring file is bgzipped.
Tabix-indexing file.
Removing temporary .tsv file.
I'm wondering if this has something to do with whether the file is compressed or not, due to the lack of some system deps on GHA (e.g. gzip, bgzip).
Ah okay I didn't realise this was the source of the error! Are those system dependencies something you can add in to rworkflows GHA calls manually?
Ah okay I didn't realise this was the source of the error! Are those system dependencies something you can add in to rworkflows GHA calls manually?
Still just a hypothesis at this point, just logging some ideas. But yeah, there's a couple of options for installing extra system deps. Will confirm that's the issue first
I also noticed that the Ubuntu GHA runner is totally stopping midway.
Not sure why this is, but trying the dev version of the rworkflows
action (which has more permissions and uses GHCR instead of DockerHub to pull the Bioc image).
I realized this is bc of the following error. Basically, the combination of creating the Docker container and installing all the large MSS deps causes the Ubuntu runner to run out of disk space! Need to look into whether we can increase that limit somehow (perhaps by paying GitHub for some additional features)
Some possible solutions. I'll test them out and implement one of them in rworkflows
as an optional arg:
Using rworkflows
dev (soon to be merged into master) seemed to get rid of the old MacOS error that occurs during vignette rendering, but is replaced with a new one that occurs during unit testing:
* Installing package...
* Checking for deprecated package usage...
Error in `collect()`:
! Failed to collect lazy table.
Caused by error in `db_collect()`:
! Arguments in `...` must be used.
✖ Problematic argument:
• ..1 = Inf
ℹ Did you misspell an argument name?
Backtrace:
▆
1. ├─BiocCheck::BiocCheck(...)
2. │ └─BiocCheck:::BiocCheckRun(...)
3. │ └─BiocCheck:::checkDeprecatedPackages(package_dir)
4. │ └─BiocCheck:::getAllDeprecatedPkgs()
5. │ └─BiocCheck:::get_deprecated_status("release")
6. │ └─BiocCheck:::get_status_file_cache(status_file_url)
7. │ └─BiocFileCache::BiocFileCache(cache, ask = FALSE)
8. │ └─BiocFileCache:::.sql_create_db(bfc)
9. │ └─BiocFileCache:::.sql_validate_version(bfc)
10. │ └─BiocFileCache:::.sql_schema_version(bfc)
11. │ ├─base::tryCatch(...)
12. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
13. │ └─tbl(src, "metadata") %>% collect(Inf)
14. ├─dplyr::collect(., Inf)
15. └─dbplyr:::collect.tbl_sql(., Inf)
16. ├─base::tryCatch(...)
17. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
18. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
19. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
20. └─dbplyr::db_collect(x$src$con, sql, n = n, warn_incomplete = warn_incomplete, ...)
21. └─rlang (local) `<fn>`()
22. └─rlang:::check_dots(env, error, action, call)
23. └─rlang:::action_dots(...)
24. ├─base (local) try_dots(...)
25. └─rlang (local) action(...)
Execution halted
That said, I think this is the exact same error we encountered recently, so I think this should fix itself in the next day or so:
There was a bug reported relating to BiocFileCache compatibility with the new version dbplyr. This affected BiocFileCache, ExperimentHub, and AnnotationHub. This has already been corrected in BiocFileCache versions 2.10.1 (Release_3_18) and 2.11.1 (devel/3.19) respectively. These versions were pushed up this morning and should be available after tomorrow's daily build. You can install from github as a temporary workaround for the next 24 hours. Sorry for the inconvenience.
Just upgraded rworkflows
to make some extra disk space available and pushed to MSS, but the caveat atm is that it only works on the Ubuntu runner. I haven't seen a solution for Mac/Windows GHA runners yet.
@Al-Murphy also mentioned we could try only installing the essential Bioc database packages for tests. I would have to add this capability into rworkflows
, as currently it installs all Imports
and Suggests
. This might be necessary for at least the Mac runner. Would installing only the Imports
break any of the examples/tests currently?
Unfortunately still seems to be failing on linux (and mac but possibly for different reasons?).
Installing only imports would not be enough for the unit tests (even for mac/windows which have less tests than linux). The only suggest packages you can ignore installing are SNPlocs.Hsapiens.dbSNP155.GRCh37
and SNPlocs.Hsapiens.dbSNP155.GRCh38
. These are by far, the largest packages used anyway so excluding them should help with space.
Unfortunately still seems to be failing on linux (and mac but possibly for different reasons?).
Yeah, i was monitoring it and for some reason it seems to get stuck at this step:
Which is strange, bc this appears to have less to do with installing large packages (at least on its face).
Installing only imports would not be enough for the unit tests (even for mac/windows which have less tests than linux). The only suggest packages you can ignore installing are
SNPlocs.Hsapiens.dbSNP155.GRCh37
andSNPlocs.Hsapiens.dbSNP155.GRCh38
. These are by far, the largest packages used anyway so excluding them should help with space.
Ok cool, I'll work on customising rworkflows so you can select specific packages to omit during installation.
https://github.com/neurogenomics/MungeSumstats/actions/runs/6652926537/job/18077848264
It seems the
rworkflows
GHA is failing due to the inability get some some resource files:This could be due to one or more of the following: