Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 15 forks source link

Fix GitHub Action #170

Open bschilder opened 1 year ago

bschilder commented 1 year ago

https://github.com/neurogenomics/MungeSumstats/actions/runs/6652926537/job/18077848264

It seems therworkflows GHA is failing due to the inability get some some resource files:

The downloaded binary packages are in
    /var/folders/3s/vfzpb5r51gs6y328rmlgzm7c0000gn/T//RtmpokJdJw/downloaded_packages
── R CMD build ─────────────────────────────────────────────────────────────────
* checking for file ‘.../DESCRIPTION’ ... OK
* preparing ‘MungeSumstats’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Error: --- re-building ‘docker.Rmd’ using rmarkdown
--- finished re-building ‘docker.Rmd’
--- re-building ‘MungeSumstats.Rmd’ using rmarkdown
trying URL 'ftp://ftp.ensembl.org/pub/assembly_mapping/homo_sapiens/GRCh37_to_GRCh38.chain.gz'
Content type 'unknown' length 285250 bytes (278 KB)
==================================================
Quitting from lines 660-675 [unnamed-chunk-13] (MungeSumstats.Rmd)
Error: Error: processing vignette 'MungeSumstats.Rmd' failed with diagnostics:
No such file or directory: '/var/folders/3s/vfzpb5r51gs6y328rmm7c0000gn/T//RtmpK6ONEL/file2c69376a9483eduAttainOkbay_standardised.tsv'. Unable to create new file for writing (it does not exist already). Do you have permission to write here, is there space on the disk and does the path exist?
--- failed re-building ‘MungeSumstats.Rmd’
--- re-building ‘OpenGWAS.Rmd’ using rmarkdown
--- finished re-building ‘OpenGWAS.Rmd’
SUMMARY: processing the following file failed:
  ‘MungeSumstats.Rmd’
Error: Error: Vignette re-building failed.
Execution halted
Error: Error in proc$get_built_file() : Build process failed
Calls: <Anonymous> ... build_package -> with_envvar -> force -> <Anonymous>
Execution halted
Error: Process completed with exit code 1.
Run actions/upload-artifact@v3
Warning: No files were found with the provided path: check. No artifacts will be uploaded.

This could be due to one or more of the following:

bschilder commented 1 year ago

Actually, it may be more related to this part:

Error: Error: processing vignette 'MungeSumstats.Rmd' failed with diagnostics:
No such file or directory: '/var/folders/3s/vfzpb5r51gs6y328rmm7c0000gn/T//RtmpK6ONEL/file2c69376a9483eduAttainOkbay_standardised.tsv'. Unable to create new file for writing (it does not exist already). Do you have permission to write here, is there space on the disk and does the path exist?

Which corresponds to this part of the MSS vignette:

eduAttainOkbayPth <- system.file("extdata", "eduAttainOkbay.txt",
                                  package = "MungeSumstats")
formatted_path <- tempfile(fileext = "eduAttainOkbay_standardised.tsv.gz")

#### 1. Read in the data and standardise header names ####
dat <- MungeSumstats::read_sumstats(path = eduAttainOkbayPth, 
                                    standardise_headers = TRUE)
knitr::kable(head(dat))
#### 2. Write to disk as a compressed, tab-delimited, tabix-indexed file ####
formatted_path <- MungeSumstats::write_sumstats(sumstats_dt = dat,
                                                save_path = formatted_path,
                                                tabix_index = TRUE,
                                                write_vcf = FALSE,
                                                return_path = TRUE)   

Running this step locally i get some messages:

Sorting coordinates with 'data.table'.
Writing in tabular format ==> /var/folders/rd/rbc_wrdj4k3djf3brk6z0_dc0000gp/T//Rtmp2mTww8/filee685477cbcdeduAttainOkbay_standardised.tsv
Writing uncompressed instead of gzipped to enable tabix indexing.
Converting full summary stats file to tabix format for fast querying...
Reading header.
Ensuring file is bgzipped.
Tabix-indexing file.
Removing temporary .tsv file.

I'm wondering if this has something to do with whether the file is compressed or not, due to the lack of some system deps on GHA (e.g. gzip, bgzip).

Al-Murphy commented 1 year ago

Ah okay I didn't realise this was the source of the error! Are those system dependencies something you can add in to rworkflows GHA calls manually?

bschilder commented 1 year ago

Ah okay I didn't realise this was the source of the error! Are those system dependencies something you can add in to rworkflows GHA calls manually?

Still just a hypothesis at this point, just logging some ideas. But yeah, there's a couple of options for installing extra system deps. Will confirm that's the issue first

bschilder commented 1 year ago

I also noticed that the Ubuntu GHA runner is totally stopping midway.

Screenshot 2023-10-27 at 12 47 27

Not sure why this is, but trying the dev version of the rworkflows action (which has more permissions and uses GHCR instead of DockerHub to pull the Bioc image).

I realized this is bc of the following error. Basically, the combination of creating the Docker container and installing all the large MSS deps causes the Ubuntu runner to run out of disk space! Need to look into whether we can increase that limit somehow (perhaps by paying GitHub for some additional features)

Screenshot 2023-10-27 at 13 21 23

Some possible solutions. I'll test them out and implement one of them in rworkflows as an optional arg:

bschilder commented 1 year ago

Using rworkflows dev (soon to be merged into master) seemed to get rid of the old MacOS error that occurs during vignette rendering, but is replaced with a new one that occurs during unit testing:

* Installing package...
* Checking for deprecated package usage...
Error in `collect()`:
! Failed to collect lazy table.
Caused by error in `db_collect()`:
! Arguments in `...` must be used.
✖ Problematic argument:
• ..1 = Inf
ℹ Did you misspell an argument name?
Backtrace:
     ▆
  1. ├─BiocCheck::BiocCheck(...)
  2. │ └─BiocCheck:::BiocCheckRun(...)
  3. │   └─BiocCheck:::checkDeprecatedPackages(package_dir)
  4. │     └─BiocCheck:::getAllDeprecatedPkgs()
  5. │       └─BiocCheck:::get_deprecated_status("release")
  6. │         └─BiocCheck:::get_status_file_cache(status_file_url)
  7. │           └─BiocFileCache::BiocFileCache(cache, ask = FALSE)
  8. │             └─BiocFileCache:::.sql_create_db(bfc)
  9. │               └─BiocFileCache:::.sql_validate_version(bfc)
 10. │                 └─BiocFileCache:::.sql_schema_version(bfc)
 11. │                   ├─base::tryCatch(...)
 12. │                   │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 13. │                   └─tbl(src, "metadata") %>% collect(Inf)
 14. ├─dplyr::collect(., Inf)
 15. └─dbplyr:::collect.tbl_sql(., Inf)
 16.   ├─base::tryCatch(...)
 17.   │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 18.   │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 19.   │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
 20.   └─dbplyr::db_collect(x$src$con, sql, n = n, warn_incomplete = warn_incomplete, ...)
 21.     └─rlang (local) `<fn>`()
 22.       └─rlang:::check_dots(env, error, action, call)
 23.         └─rlang:::action_dots(...)
 24.           ├─base (local) try_dots(...)
 25.           └─rlang (local) action(...)
Execution halted

That said, I think this is the exact same error we encountered recently, so I think this should fix itself in the next day or so:

There was a bug reported relating to BiocFileCache compatibility with the new version dbplyr. This affected BiocFileCache, ExperimentHub, and AnnotationHub. This has already been corrected in BiocFileCache versions 2.10.1 (Release_3_18) and 2.11.1 (devel/3.19) respectively. These versions were pushed up this morning and should be available after tomorrow's daily build. You can install from github as a temporary workaround for the next 24 hours. Sorry for the inconvenience.

bschilder commented 1 year ago

Just upgraded rworkflows to make some extra disk space available and pushed to MSS, but the caveat atm is that it only works on the Ubuntu runner. I haven't seen a solution for Mac/Windows GHA runners yet.

@Al-Murphy also mentioned we could try only installing the essential Bioc database packages for tests. I would have to add this capability into rworkflows, as currently it installs all Imports and Suggests. This might be necessary for at least the Mac runner. Would installing only the Imports break any of the examples/tests currently?

Al-Murphy commented 1 year ago

Unfortunately still seems to be failing on linux (and mac but possibly for different reasons?).

Installing only imports would not be enough for the unit tests (even for mac/windows which have less tests than linux). The only suggest packages you can ignore installing are SNPlocs.Hsapiens.dbSNP155.GRCh37 and SNPlocs.Hsapiens.dbSNP155.GRCh38. These are by far, the largest packages used anyway so excluding them should help with space.

bschilder commented 1 year ago

Unfortunately still seems to be failing on linux (and mac but possibly for different reasons?).

Yeah, i was monitoring it and for some reason it seems to get stuck at this step:

Screenshot 2023-11-01 at 22 17 01

Which is strange, bc this appears to have less to do with installing large packages (at least on its face).

Installing only imports would not be enough for the unit tests (even for mac/windows which have less tests than linux). The only suggest packages you can ignore installing are SNPlocs.Hsapiens.dbSNP155.GRCh37 and SNPlocs.Hsapiens.dbSNP155.GRCh38. These are by far, the largest packages used anyway so excluding them should help with space.

Ok cool, I'll work on customising rworkflows so you can select specific packages to omit during installation.