MRCIEU / ieugwasr

R interface to the IEU GWAS database API
https://mrcieu.github.io/ieugwasr/
Other
73 stars 23 forks source link

local ld_clump:Warning: cannot open file 'C:\Users\Zheng\AppData\Local\Temp\RtmpqWMUyA\file65d825722a42.clumped': No such file or directoryError in file(file, "rt") : cannot open the connection #34

Open kouji175 opened 1 year ago

kouji175 commented 1 year ago

I can make sure that this file is exist and R have right to modify it.

However, this file does not have ".clumped" at the end of name, I think it maybe the reason why warning.

My information:

sessionInfo() R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

time zone: America/New_York tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] gwasvcf_0.1.1 ieugwasr_0.1.5 plinkbinr_0.0.0.9000 [4] data.table_1.14.8 dplyr_1.1.2 TwoSampleMR_0.5.6
[7] readr_2.1.4

loaded via a namespace (and not attached): [1] tidyselect_1.2.0 blob_1.2.4
[3] R.utils_2.12.2 filelock_1.0.2
[5] Biostrings_2.68.1 bitops_1.0-7
[7] fastmap_1.1.1 RCurl_1.98-1.12
[9] BiocFileCache_2.8.0 VariantAnnotation_1.46.0
[11] GenomicAlignments_1.36.0 XML_3.99-0.14
[13] digest_0.6.31 lifecycle_1.0.3
[15] KEGGREST_1.40.0 RSQLite_2.3.1
[17] magrittr_2.0.3 compiler_4.3.0
[19] genetics.binaRies_0.1.0 rlang_1.1.1
[21] progress_1.2.2 tools_4.3.0
[23] utf8_1.2.3 yaml_2.3.7
[25] rtracklayer_1.60.0 knitr_1.43
[27] prettyunits_1.1.1 S4Arrays_1.0.4
[29] bit_4.0.5 curl_5.0.0
[31] DelayedArray_0.26.3 plyr_1.8.8
[33] xml2_1.3.4 BiocParallel_1.34.2
[35] R.oo_1.25.0 BiocGenerics_0.46.0
[37] grid_4.3.0 stats4_4.3.0
[39] fansi_1.0.4 biomaRt_2.56.0
[41] SummarizedExperiment_1.30.1 cli_3.6.1
[43] rmarkdown_2.21 crayon_1.5.2
[45] generics_0.1.3 rstudioapi_0.14
[47] httr_1.4.6 tzdb_0.4.0
[49] rjson_0.2.21 DBI_1.1.3
[51] cachem_1.0.8 stringr_1.5.0
[53] zlibbioc_1.46.0 parallel_4.3.0
[55] AnnotationDbi_1.62.1 XVector_0.40.0
[57] restfulr_0.0.15 matrixStats_0.63.0
[59] vctrs_0.6.2 Matrix_1.5-4
[61] jsonlite_1.8.4 IRanges_2.34.0
[63] hms_1.1.3 S4Vectors_0.38.1
[65] bit64_4.0.5 GenomicFeatures_1.52.0
[67] glue_1.6.2 codetools_0.2-19
[69] stringi_1.7.12 GenomeInfoDb_1.36.0
[71] GenomicRanges_1.52.0 BiocIO_1.10.0
[73] tibble_3.2.1 pillar_1.9.0
[75] rappdirs_0.3.3 htmltools_0.5.5
[77] GenomeInfoDbData_1.2.10 BSgenome_1.68.0
[79] R6_2.5.1 dbplyr_2.3.2
[81] evaluate_0.21 lattice_0.21-8
[83] Biobase_2.60.0 R.methodsS3_1.8.2
[85] png_0.1-8 Rsamtools_2.16.0
[87] memoise_2.0.1 Rcpp_1.0.10
[89] xfun_0.39 MatrixGenerics_1.12.0
[91] pkgconfig_2.0.3

Here is the code I want to run

exp_dat<-ieugwasr::ld_clump(dplyr::tibble(rsid=exp_dat$SNP, pval=exp_dat$pval.exposure),plink_bin = genetics.binaRies::get_plink_binary(),bfile = "D:/Cornell/bioinformatics/MR/Breast-cancer/1kg_ref",clump_kb = 500,clump_r2 = 5*10^-8) Clumping 1XlgyO, 14375928 variants, using EUR population reference Warning: cannot open file 'C:\Users\Zheng\AppData\Local\Temp\RtmpqWMUyA\file65d825722a42.clumped': No such file or directoryError in file(file, "rt") : cannot open the connection

Yaolab-fantastic commented 1 year ago

I have the same problem. The error message is:

Clumping ASw1YU, 38 variants, using EUR population reference
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'C:\Users\DELL\AppData\Local\Temp\RtmpABwtLw\file13f34200527b6.clumped': No such file or directory

I found there is the file C:\Users\DELL\AppData\Local\Temp\RtmpABwtLw\file13f34200527b6. But this temp file has no ".clumped" extension.

Below is my code:

  exp_dat_clumped <- ld_clump(
    dat = exp_dat,
    clump_kb = 10000, 
    clump_r2 = 0.001, 
    clump_p = 5e-8,
    plink_bin = genetics.binaRies::get_plink_binary(),
    bfile = 'D:/Data/1kg/ieugwasr/EUR' #path to LD reference dataset
  ) 
Neil-D123 commented 1 year ago

Hi!

Im having the same problem! Did you guys find a solution to this?

kouji175 commented 1 year ago

you can write what you get from get_plink_exe() to replace the plink_bin =, for example exp_dat_clump <- ieugwasr::ld_clump(dplyr::tibble(rsid=exp_dat$SNP, pval=exp_dat$pval.exposure,id=exp_dat$id.exposure),plink_bin = "D:/R-4.3.0/library/plinkbinr/bin/plink_Windows.exe",bfile = "D:/Cornell/bioinformatics/MR/Breast-cancer/1kg_ref/EUR", clump_kb=1000,clump_r2=0.1, clump_p=5E-08) Then you can check if there are "Warning: No significant --clump results. Skipping" which is normal because it just means there are no SNPs could be selected as instruments.

Neil-D123 commented 1 year ago

Hi,

Thanks for getting back to me. I tried that and im still getting the same error message. But i downloaded the plink.exe binary from the site and put it into the working directory and then the following code worked:

exp_dat_clump <- ieugwasr::ld_clump(dplyr::tibble(rsid=instruments$rsid, pval=instruments$pval,id=instruments$id.exposure),plink_bin = "C:/HCC Study/plink.exe" ,bfile = "C:/HCC Study/EUR/EUR", clump_kb=1000,clump_r2=0.01, clump_p=5E-08)

Thanks!

kouji175 commented 1 year ago

Hi,

Thanks for getting back to me. I tried that and im still getting the same error message. But i downloaded the plink.exe binary from the site and put it into the working directory and then the following code worked:

exp_dat_clump <- ieugwasr::ld_clump(dplyr::tibble(rsid=instruments$rsid, pval=instruments$pval,id=instruments$id.exposure),plink_bin = "C:/HCC Study/plink.exe" ,bfile = "C:/HCC Study/EUR/EUR", clump_kb=1000,clump_r2=0.01, clump_p=5E-08)

Thanks!

Hi, can you show head of your rsid? You must make sure the rsids are consistent with rsids in your bfile

shuang-pi-ji commented 1 year ago

Hi,I have the same problem. Here is the subset of my data for local clumping :

T1.csv

shuang-pi-ji commented 1 year ago

Hi,I have the same problem. Here is the subset of my data for local clumping :

T1.csv

OK...My problem should be different. I have found the solution for my data. This is because my data column names not consistent with the "dat" data, which column names should be $rsid, $pval and $id. When I use select and rename the column, it works.

`T1 %>% dplyr::select(rsid=SNP,pval=pval.exposure,id=id.exposure) %>% filter(rsid!=".")->DD

D <- ld_clump_local(dat = DD,clump_p = 1e-05,clump_kb = 10000,clump_r2 = 0.001, bfile = "/input/LD_reference_dataset/EUR",plink_bin = genetics.binaRies::get_plink_binary())`

XjtuZhangKun-lab commented 1 year ago

Hello, have you solved this problem yet?

shuang-pi-ji commented 1 year ago

I think this problem may also be caused by the memory or other space problem in Windows when you run a large R object and get a crash in R. After this crash, R cannot create the Temporary Files in the Windows Temp directory. I found this problem because when I used the same code to run the input with a large read_exposure_data TwoSampleMR object, it crashed and could not clump after restart. So my solution is: Close R (Rstudio) and delete the whole temp file that R (Rstudio) created.(Maybe have the name like RtmpG8APgv in AppDataLocalTemp dir) and rerun the code. It works. I think using the data that has already been filtered (like subset SNP data with a P value > 1E-5) before clumping may be a method to prevent this problem.

Steven-Shixq commented 1 year ago

Sharing my experience in resolving the same error: In my particular scenario (Ubuntu), the program functions correctly after I've removed the X chromosome data from the input variable dat.

loftyddd commented 12 months ago

I have the same problem. The error message is:

Clumping ASw1YU, 38 variants, using EUR population reference
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'C:\Users\DELL\AppData\Local\Temp\RtmpABwtLw\file13f34200527b6.clumped': No such file or directory

I found there is the file C:\Users\DELL\AppData\Local\Temp\RtmpABwtLw\file13f34200527b6. But this temp file has no ".clumped" extension.

Below is my code:

  exp_dat_clumped <- ld_clump(
    dat = exp_dat,
    clump_kb = 10000, 
    clump_r2 = 0.001, 
    clump_p = 5e-8,
    plink_bin = genetics.binaRies::get_plink_binary(),
    bfile = 'D:/Data/1kg/ieugwasr/EUR' #path to LD reference dataset
  ) 

have you ever sovled it? I meet the same question!

pjordab commented 11 months ago

I have exactly the same issue. The file exists but without the suffix .clumped

loftyddd commented 11 months ago

我有完全相同的问题。 文件存在,但没有后缀 .clumped

I have a another method to solve the clumping error. Just filter the snp that Pvalue is litttle enough before we do online clumping, the you maybe could run the online clumping. However, if you yet could'nt run the code, I suggest you to clumping by dividing the file into several little files. That maybe a method.

pjordab commented 11 months ago

Thank you for your answer. I already pre-filtered the SNPs and my input file contain only 28 SNPs to clump... but still not working.

loftyddd commented 11 months ago

Thank you for your answer. I already pre-filtered the SNPs and my input file contain only 28 SNPs to clump... but still not working.

Have ever try to do online clumping while swithing a different VPN?

pjordab commented 11 months ago

I tried from the Compute Canada server and from the R studio on my computer... I'll try tomorrow from my office but if the server is busy... not sure my IP will have an influence, but I'll try :)

Need to find how to make work this:

ld_clump( dplyr::tibble(rsid=dat$rsid, pval=dat$pval, id=dat$trait_id), plink_bin = genetics.binaRies::get_plink_binary(), bfile = "/path/to/reference/EUR" ) https://mrcieu.github.io/ieugwasr/reference/ld_clump.html

My "dat" file head lines:

     rsid         pval trait_id

1 rs113394178 1.177877e-09 trait 2 rs13006682 6.483358e-09 trait 3 rs13094827 3.117454e-09 trait 4 rs139831 3.868121e-08 trait 5 rs17608766 1.111732e-11 trait 6 rs1892027 4.841724e-08 trait

My code:

clumped<-ld_clump( dplyr::tibble(rsid=dat$rsid, pval=dat$pval, id=dat$trait_id), plink_bin = genetics.binaRies::get_plink_binary(), bfile = "EUR", clump_kb = 10000, clump_r2 = 0.001, clump_p=5e-8 )

Output:

Clumping trait, 22 variants, using EUR population reference Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'C:\Users\asus\AppData\Local\Temp\RtmpaqodFr\file235060db7c1e.clumped': No such file or directory

Thank you!

Missatge de loftyddd @.***> del dia dl., 6 de nov. 2023 a les 22:23:

Thank you for your answer. I already pre-filtered the SNPs and my input file contain only 28 SNPs to clump... but still not working.

Have ever try to do online clumping while swithing a different VPN?

— Reply to this email directly, view it on GitHub https://github.com/MRCIEU/ieugwasr/issues/34#issuecomment-1797580729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQV3QYBGF4PSBBTQBRZBW5LYDGSRJAVCNFSM6AAAAAAYTOFEBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJXGU4DANZSHE . You are receiving this because you commented.Message ID: @.***>

pjordab commented 11 months ago

Got the solution:

I downloaded plink.exe from here:

https://www.cog-genomics.org/plink/

and replaced plink_bin = genetics.binaRies::get_plink_binary() --> plink_bin = "plink.exe"

Now it works!

loftyddd commented 11 months ago

Congratulations!

---- Replied Message ---- | From | @.> | | Date | 11/07/2023 11:55 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [MRCIEU/ieugwasr] local ld_clump:Warning: cannot open file 'C:\Users\Zheng\AppData\Local\Temp\RtmpqWMUyA\file65d825722a42.clumped': No such file or directoryError in file(file, "rt") : cannot open the connection (Issue #34) |

Got the solution:

I downloaded plink.exe from here:

https://www.cog-genomics.org/plink/

and replaced plink_bin = genetics.binaRies::get_plink_binary() --> plink_bin = "plink.exe"

Now it works!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

mentors501 commented 8 months ago

Got the solution:

I downloaded plink.exe from here:

https://www.cog-genomics.org/plink/

and replaced plink_bin = genetics.binaRies::get_plink_binary() --> plink_bin = "plink.exe"

Now it works!

it works!! But don`t use development edition. Using stable edition will be fine.

greengarden0925 commented 7 months ago

ve a another method to solve the clumping error. Just filter the snp that Pvalue is litttle enough before we do online clumping, the you maybe could run the online clumping. Ho

The error happened if there is no snps need to be clumped, no "[tempfile prefix].clumped" will be created. That's why error said cannot open file 'C:\Users\asus\AppData\Local\Temp\RtmpaqodFr\file235060db7c1e.clumped':

The fundamental soluation is to revise the scrips of "ld_clump_local" and "ld_clump". I revised the original functions into "ld_clump_local_YT" and "ld_clump_YT". The following is the code contents.

ld_clump_local_YT=function(dat, clump_kb, clump_r2, clump_p, bfile, plink_bin) {
  #debug:
  # dat= data.frame(rsid=exposure$SNP, 
  #                     pval=exposure$pval.exposure, 
  #                     id=exposure$id.exposure)
  # .....................
  shell <- ifelse(Sys.info()["sysname"] == "Windows", "cmd", 
                    "sh")

    fn <- tempfile()

    write.table(data.frame(SNP = dat[["rsid"]], P = dat[["pval"]]), 
                file = fn, row.names = F, col.names = T, quote = F)

    fun2 <- paste0(shQuote(plink_bin, type = shell), " --bfile ",
                   shQuote(bfile, type = shell), " --clump ",
                   shQuote(fn,type = shell), " --clump-p1 ", clump_p, " --clump-r2 ",
                   clump_r2, " --clump-kb ", clump_kb, " --out ",
                   shQuote(fn,type = shell))

    system(fun2)

    #if ######.clumped exists
    if(file.exists(paste(fn, ".clumped", sep = ""))){
      res <- read.table(paste(fn, ".clumped", sep = ""), header = T)
      y <- subset(dat, !dat[["rsid"]] %in% res[["SNP"]])
      if (nrow(y) > 0) {
        message("Removing ", length(y[["rsid"]]), " of ", nrow(dat), 
                " variants due to LD with other variants or absence from LD reference panel")
      }
      unlink(paste(fn, "*", sep = ""))
      return(subset(dat, dat[["rsid"]] %in% res[["SNP"]]))
    }else{ #does not exists clumped data
      return(dat)
    }

  }

ld_clump_YT=function (dat = NULL, clump_kb = 10000, clump_r2 = 0.001, clump_p = 0.99, 
            pop = "EUR", access_token = NULL, bfile = NULL, plink_bin = NULL){
    stopifnot("rsid" %in% names(dat))
    stopifnot(is.data.frame(dat))
    if (is.null(bfile)) {
      message("Please look at vignettes for options on running this locally if you need to run many instances of this command.")
    }
    if (!"pval" %in% names(dat)) {
      if ("p" %in% names(dat)) {
        warning("No 'pval' column found in dat object. Using 'p' column.")
        dat[["pval"]] <- dat[["p"]]
      }
      else {
        warning("No 'pval' column found in dat object. Setting p-values for all SNPs to clump_p parameter.")
        dat[["pval"]] <- clump_p
      }
    }
    if (!"id" %in% names(dat)) {
      dat$id <- random_string(1)
    }
    if (is.null(bfile)) {
      access_token = check_access_token()
    }
    ids <- unique(dat[["id"]])
    res <- list()
    for (i in 1:length(ids)) {
      x <- subset(dat, dat[["id"]] == ids[i])
      if (nrow(x) == 1) {
        message("Only one SNP for ", ids[i])
        res[[i]] <- x
      }
      else {
        message("Clumping ", ids[i], ", ", nrow(x), " variants, using ", 
                pop, " population reference")
        if (is.null(bfile)) {
          res[[i]] <- ld_clump_api(x, clump_kb = clump_kb, 
                                   clump_r2 = clump_r2, clump_p = clump_p, pop = pop, 
                                   access_token = access_token)
        }
        else {
          res[[i]] <- ld_clump_local_YT(x, clump_kb = clump_kb, 
                                     clump_r2 = clump_r2, clump_p = clump_p, bfile = bfile, 
                                     plink_bin = plink_bin)
        }
      }
    }
    res <- dplyr::bind_rows(res)
    return(res)
  }

Run the clumping using the following code:

    exposure_clumped=ld_clump_YT(
      dat=data.frame(rsid=exposure$SNP, 
                     pval=exposure$pval.exposure, 
                     id=exposure$id.exposure),
      clump_kb = 10000,
      clump_r2 = 0.001,
      clump_p = 0.99,
      plink_bin = "[path to plink.exe, i.e. './plink_win64_20231211/plink.exe']",
      bfile = "[input your file path of 1000 genome reference],i.e. './Data/1000genomeLDreference/EUR'"
    )
Leweibo commented 7 months ago

This problem is data depended. When I change another set of data, It works well.

and maybe the warning info is the key of the problem

"Warning: No significant --clump results. Skipping.“