Closed km-2021 closed 2 months ago
Hi!
Yep, our institute told me about a year ago that I was essentially the only user on their gitlab that had any regular external traffic, and that they wanted to shut it down, so they helped me to move the data onto an external site instead (I think it ended up on dropbox?).
So if superFreq is trying to access gitlab, then either you're using an old version of superFreq, or I forgot to change some of the gitlab links. Or some other bug I didn't think about. :D
So maybe you can send through the log file, in particular I want to know what superFreq version you're running (if old, updating should help). If you're running on the latest version already, then I want to know what genome (mm10/hg19/hg38), and mode (RNA, exome, genome), which determines which resources are being downloaded.
Hi,
The superFreq version is 1.4.5, the genome is hg19, and the mode is exome. The same environment worked fine one week ago. The log file is:
2023-09-01 17:02:58 ###################################################################### Running superFreq version 1.4.5 SessionInfo():
R version 4.2.3 (2023-03-15) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 22.04.3 LTS
Matrix products: default BLAS/LAPACK: /home/km/anaconda3/lib/libmkl_rt.so.1
locale:
[1] LC_CTYPE=ja_JP.UTF-8 LC_NUMERIC=C
[3] LC_TIME=ja_JP.UTF-8 LC_COLLATE=ja_JP.UTF-8
[5] LC_MONETARY=ja_JP.UTF-8 LC_MESSAGES=ja_JP.UTF-8
[7] LC_PAPER=ja_JP.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base
other attached packages:
[1] superFreq_1.4.5 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[3] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome.Mmusculus.UCSC.mm10_1.4.3
[5] BSgenome_1.66.3 rtracklayer_1.58.0
[7] MutationalPatterns_3.8.1 NMF_0.26
[9] Biobase_2.58.0 cluster_2.1.4
[11] rngtools_1.5.2 registry_0.5-1
[13] limma_3.54.2 Rsubread_2.12.3
[15] R.oo_1.25.0 R.methodsS3_1.8.2
[17] Rsamtools_2.14.0 Biostrings_2.66.0
[19] XVector_0.38.0 biomaRt_2.54.1
[21] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[23] IRanges_2.32.0 S4Vectors_0.36.2
[25] BiocGenerics_0.44.0 WriteXLS_6.4.0
loaded via a namespace (and not attached):
[1] matrixStats_1.0.0 bitops_1.0-7
[3] bit64_4.0.5 filelock_1.0.2
[5] doParallel_1.0.17 RColorBrewer_1.1-3
[7] progress_1.2.2 httr_1.4.6
[9] tools_4.2.3 utf8_1.2.3
[11] R6_2.5.1 DBI_1.1.3
[13] colorspace_2.1-0 tidyselect_1.2.0
[15] prettyunits_1.1.1 bit_4.0.5
[17] curl_4.3.3 compiler_4.2.3
[19] cli_3.6.1 xml2_1.3.5
[21] DelayedArray_0.24.0 scales_1.2.1
[23] rappdirs_0.3.3 stringr_1.5.0
[25] digest_0.6.33 pkgconfig_2.0.3
[27] MatrixGenerics_1.10.0 dbplyr_2.3.3
[29] fastmap_1.1.1 rlang_1.1.1
[31] RSQLite_2.3.1 BiocIO_1.8.0
[33] generics_0.1.3 BiocParallel_1.32.6
[35] dplyr_1.1.2 VariantAnnotation_1.44.1
[37] RCurl_1.98-1.12 magrittr_2.0.3
[39] GenomeInfoDbData_1.2.9 Matrix_1.6-0
[41] Rcpp_1.0.11 munsell_0.5.0
[43] fansi_1.0.4 lifecycle_1.0.3
[45] stringi_1.7.12 yaml_2.3.7
[47] ggalluvial_0.12.5 SummarizedExperiment_1.28.0
[49] zlibbioc_1.44.0 plyr_1.8.8
[51] BiocFileCache_2.6.1 grid_4.2.3
[53] blob_1.2.4 crayon_1.5.2
[55] lattice_0.21-8 GenomicFeatures_1.50.4
[57] hms_1.1.3 KEGGREST_1.38.0
[59] pillar_1.9.0 rjson_0.2.21
[61] reshape2_1.4.4 codetools_0.2-19
[63] XML_3.99-0.14 glue_1.6.2
[65] BiocManager_1.30.21.1 png_0.1-8
[67] vctrs_0.6.3 foreach_1.5.2
[69] gtable_0.3.3 cachem_1.0.8
[71] ggplot2_3.4.2 gridBase_0.4-7
[73] restfulr_0.0.15 pracma_2.4.2
[75] tibble_3.2.1 iterators_1.0.14
[77] GenomicAlignments_1.34.1 AnnotationDbi_1.60.2
[79] memoise_2.0.1
Testing samtools... Found samtools 1.6 . Seems ok. Starting run with input files: sampleMetaDataFile: /home/km/superFreq/splitMetaData/P190.tsv vcfFiles:
Normal directory: /home/km/superFreq/myReferenceNormals/bam Normal coverage directory: /home/km/superFreq/myReferenceNormals/bam dbSNP directory: superFreqResources/dbSNP capture regions: will be downloaded from superFreq server. Plotting to /home/km/superFreq/MyAnalysis/plots/P190 Saving R files to /home/km/superFreq/MyAnalysis/R/P190 Genome is hg19 Running in exome mode. exacPopulation is all Running on at most 8 cpus. Rare germline variants are shown in output.
Parameters for this run are: maxCov: 150 systematicVariance: 0.03 cloneDistanceCut: 2.326348 cosmicSalvageRate: 0.001
Normal bamfiles are: /home/km/superFreq/myReferenceNormals/bam/LK168N.bam /home/km/superFreq/myReferenceNormals/bam/LK179N.bam /home/km/superFreq/myReferenceNormals/bam/LK216N.bam /home/km/superFreq/myReferenceNormals/bam/LK219N.bam /home/km/superFreq/myReferenceNormals/bam/LK252N.bam /home/km/superFreq/myReferenceNormals/bam/LK258N.bam /home/km/superFreq/myReferenceNormals/bam/LK261N.bam /home/km/superFreq/myReferenceNormals/bam/LK267N.bam /home/km/superFreq/myReferenceNormals/bam/LK268N.bam Normal bamfiles are: /home/km/superFreq/myReferenceNormals/bam/LK168N.bam /home/km/superFreq/myReferenceNormals/bam/LK179N.bam /home/km/superFreq/myReferenceNormals/bam/LK216N.bam /home/km/superFreq/myReferenceNormals/bam/LK219N.bam /home/km/superFreq/myReferenceNormals/bam/LK252N.bam /home/km/superFreq/myReferenceNormals/bam/LK258N.bam /home/km/superFreq/myReferenceNormals/bam/LK261N.bam /home/km/superFreq/myReferenceNormals/bam/LK267N.bam /home/km/superFreq/myReferenceNormals/bam/LK268N.bam Loading capture regions...
Please let me know if you need any additional information.
Thanks,
Thanks.
So seems I left a lot of URLs still pointing to the WEHI gitlab. Guess it's still been online until a certificate expired between your last use.
I'll update the links, run it through the tests and push to live. Hopefully this week.
In the meanwhile, if you've run it before (with exome hg19), then you can set superFreq(..., resourceDirectory='path/to/previous/run/superFreqResources')
, to have it point to the reosurces from last working run. Probably good practice anyway to avoid re-downloading for every batch.
Thank you for the update.
I understand that you have left a number of URLs pointing to the WEHI gitlab. I will keep that in mind and use the resources from the last working run if I need to use superFreq again before the updated links are pushed to live.
However, I would like to caution you that there is a possibility that running superFreq with the resources from the last working run may still result in an error. This is because the resources may have been updated or changed since the last run.
If you experience any errors, please let me know and I will do my best to help you troubleshoot the issue.
I appreciate your help.
The resource directory is meant to be re-used like this, which is why the option is there. It's resources like the mm10 genome sequence, gene regions, COSMIC mutation frequencies and that kind of things. Not things that change between batches of data. If I do update them, say I chose to use refseq regions instead of ensembl, then I don't replace the old file but make a new one so that superfreq recognises that the new file isn't in place. So while open source software comes without guarantees, I think it should be relatively safe to re-use the resources, and any updates should be recognised and acted on.
Thank you for the update.
I ran superFreq before, and it ran with the default capture region.bed file without specifying the capture region. I passed the path to the resource directory as you instructed, but I don't know how to specify the default capture region. Could you please tell me how to specify the capture region for the exome RNA mode?
I appreciate your help.
Ahh, yes I forgot the capture regions, sorry.
The .bed file with the exons (the padded regions where it's looking for variants in an exome) is not in the resource directory, it's in the R directory. That is for "historical" reasons, because it used to be a user-supplied input, but I've since moved it to be included in the package instead. So now it's just a resource that is places in a silly place... :/
It's a bit messy, but you can fix it by copying the bed files over to the R directories of the new runs, one copy for each INDIVUDAL...
I will fix that in the patch as well, put the .bed files in the resource directory so that it gets properly re-used rather than re-downloaded for every individual as it is now.
Sorry for the mess here. It's really useful for me and for the package though, these are long needed changes that I have to do now, so thanks for that, and thanks for patience. :)
I am still working on this. A bit slower than expected, but not forgotten.
Hey @ChristofferFlensburg, did you have any luck with sorting this?
Prepping for some cohort analysis that will be run as data becomes available and I was wanting to set it up that I'd use a single cached set of resources so this sounds perfect.
If there are any tasks that don't require too deep a knowledge of the codebase I'd be happy to have a go at it if that's helpful
Thanks!
Hi!
Yep, there were some delays, but it's done and up live now. I found some half-done fixed for (unrelated) annotation issues that I wanted to fix before pushing live. But I think both the resource download issue and the annotation issue are fixed now, and it went through all the test runs, and I just pushed it live, so hopefully should be all good now! 🤞
let me know if further issues.
I am writing to you today to let you know about an issue connecting to your GitLab server. I am using the superFreq R pipeline to connect to your server, but I am receiving an error message that says: Warning: SSL certificate verification failed: self-signed certificate I have tried to download the certificate from your website but still receive the same error message. I would appreciate it if you could investigate the issue and provide me with a solution.
Thank you for your time and consideration.