coriell-research / ursaPGx

R Package for Star Allele Annotation
Other
3 stars 1 forks source link

Questions about installation #1

Closed Pharmacogenetecist closed 1 year ago

Pharmacogenetecist commented 1 year ago

Thank you for sharing a very interesting package.

I have tried to install it in Rstudio running R4.3.0 and I was getting installation errors devtools::install_github("coriell-research/ursaPGx") Downloading GitHub repo coriell-research/ursaPGx@HEAD -- R CMD build ---------------------------------------------------------------------------------- '"C:\R\R-4.3.0/bin/x64/Rterm.exe"' is not recognized as an internal or external command, operable program or batch file. Error: Failed to install 'ursaPGx' from GitHub: ! System command 'Rcmd.exe' failed

I then switched to R4.2 and installation worked without any problems.

However, when I try run the CYP2C19 <- readPGx(vcf, gene = "CYP2C19") after loading in vcf, Rstudio crashes and I have to restart.

Would you have any suggestions, I could try to troubleshoot. Thank you.

jcalendo commented 1 year ago

Hi,

Thanks for your interest in the package! It's hard to say what's happening without any specific error messages (which, I understand are not generated because of the crash) but there are a few things to try:

Please let me know if this helps or if you need any more assistance.

Pharmacogenetecist commented 1 year ago

Thanks for your response. So I got the package installed with version 4.2.3 and 4.2.1

vcf <- "C:/Bioinformatic_Output/ursaPGx/0224WGS.vcf"

file.exists("C:/Bioinformatic_Output/ursaPGx/0224WGS.vcf") [1] TRUE

Yes the .tbi file is there.

When I try and run callDiplotypes or readPGx, it just crashes Rstudio

I tried running it natively in Rgui and get the same crashing unfortunately.

I am not sure if it is me/my-PC specifically. I will try get a colleague to try run it and report back.

Thank you.

jcalendo commented 1 year ago

I have added some additional checks to the readPGx() function which should emit more warning and error messages for unexpected inputs. Hopefully this will help me narrow down the problem that you are experiencing. When you get the chance, please update to the newest version of ursaPGx and try again.

I apologize for the inconvenience. Reading and subsetting user supplied VCFs is the most unpredictable part of the pipeline. The chromosome names (UCSC style: "chr1", "chr2", ... or NCBI style: "1", "2", ...) must match between the input VCF and the PharmVar defined ranges (UCSC style for GRCh38 and NCBI style for GRCh37) before reading in the sample VCF data in order for the subsetting to work properly. There are several checks to try to detect what the user defined chromosome names are but these rely on the reference being defined in the VCF header, otherwise the default values used by PharmVar are used. The recent changes should emit warning messages if this occurs. However, I'm not entirely sure if this is the problem you are experiencing.

Pharmacogenetecist commented 1 year ago

Thank you. To avoid the vcf input issue I downloaded the chr10 file you used.

I also re-installed R and Rstudio and then installed ursaPGx

See below.

R version 4.3.1 (2023-06-16 ucrt) -- "Beagle Scouts" Copyright (C) 2023 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

[Workspace loaded from C:/Bioinformatic_Output/ursaPGx/.RData]

getwd() [1] "C:/Bioinformatic_Output/ursaPGx" library(ursaPGx) Loading required package: VariantAnnotation Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval,
evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste,
pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min

Loading required package: MatrixGenerics Loading required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods,
colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs, colMeans2,
colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2,
colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs,
rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads,
rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans,
rowWeightedMedians, rowWeightedSds, rowWeightedVars

Loading required package: GenomeInfoDb Loading required package: S4Vectors Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:utils’:

findMatches

The following objects are masked from ‘package:base’:

expand.grid, I, unname

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:grDevices’:

windows

Loading required package: GenomicRanges Loading required package: SummarizedExperiment Loading required package: Biobase Welcome to Bioconductor

Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

rowMedians

The following objects are masked from ‘package:matrixStats’:

anyMissing, rowMedians

Loading required package: Rsamtools Loading required package: Biostrings Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

strsplit

Attaching package: ‘VariantAnnotation’

The following object is masked from ‘package:base’:

tabulate

vcf <- "C:/Bioinformatic_Output/ursaPGx/1kGP_high_coverage_Illumina.chr10.filtered.SNV_INDEL_SV_phased_panel.vcf"

After running this command, my Rstudio crashes.

Screenshot 2023-08-01 093213

The good news is, this appears to be a "me, issue" I had a colleague run it on their Rstudio (albeit on a MAC) and it worked fine with the chr10 vcf. It should not be a resources issue either, as I have 32GB ram running on an i9 @2..4GHz.

Thanks for your help, I will reach out again if I find a solution!

Pharmacogenetecist commented 1 year ago

Okay resolved.... Instead of vcf <- "C:/Bioinformatic_Output/ursaPGx/1kGP_high_coverage_Illumina.chr10.filtered.SNV_INDEL_SV_phased_panel.vcf"

I used

vcf <- "1kGP_high_coverage_Illumina.chr10.filtered.SNV_INDEL_SV_phased_panel.vcf"

This does not crash Rstudio and I can run your pipeline!

Also - it appears that when I use a .vcf Rstudio crashes, but using a .vcf.gz works fine.

jcalendo commented 1 year ago

Awesome, that's great to hear! I'll close this issue then.

Thanks for testing out the package. If you come across anything else please let me know. The package is still in active development and I am trying to ensure it's useful for data from outside of the 1000 Genomes Project (data the package was primarily built to annotate) as well. Additionally, Windows related issues are more difficult for me to diagnose since my dev machines are Linux & MacOS.