WHops / NAHRwhals

R package and wrapper functions for identifying serial structural variations from genome assemblies
MIT License
25 stars 2 forks source link

Biostrings #3

Closed pdoris closed 1 year ago

pdoris commented 1 year ago

Biostrings not available in package

NAHRwhals % Rscript install_package.R Loading required package: devtools Loading required package: usethis Loading required package: argparse

Skipping 1 packages not available: Biostrings

WHops commented 1 year ago

Thanks for raising the issue! Can you try if installing Biostrings manually solves the issue? This should be possible from within R with:

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("Biostrings")

pdoris commented 1 year ago

Wolfram

Sorry, this is rather verbose, but ends….> library(nahrwhals) Error in library(nahrwhals) : there is no package called ‘nahrwhals’

Peter

Error in library(nahrwhals) : there is no package called ‘nahrwhals’

+++++++++++++++++++++++++++

if (!require("BiocManager", quietly = TRUE))

  • install.packages("BiocManager") Bioconductor version 3.14 (BiocManager 1.30.19), R 4.1.2 (2021-11-01) Bioconductor version '3.14' is out-of-date; the current release version '3.16' is available with R version '4.2'; see https://bioconductor.org/install

BiocManager::install("Biostrings") Bioconductor version 3.14 (BiocManager 1.30.19), R 4.1.2 (2021-11-01) Installing package(s) 'Biostrings' also installing the dependencies ‘bitops’, ‘zlibbioc’, ‘RCurl’, ‘GenomeInfoDbData’, ‘BiocGenerics’, ‘S4Vectors’, ‘IRanges’, ‘XVector’, ‘GenomeInfoDb’

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/bitops_1.0-7.tgz' Content type 'application/x-gzip' length 29174 bytes (28 KB)

downloaded 28 KB

trying URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/contrib/4.1/zlibbioc_1.40.0.tgz' Content type 'application/octet-stream' length 125768 bytes (122 KB)

downloaded 122 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/RCurl_1.98-1.10.tgz' Content type 'application/x-gzip' length 1089622 bytes (1.0 MB)

downloaded 1.0 MB

trying URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/contrib/4.1/BiocGenerics_0.40.0.tgz' Content type 'application/octet-stream' length 589888 bytes (576 KB)

downloaded 576 KB

trying URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/contrib/4.1/S4Vectors_0.32.4.tgz' Content type 'application/octet-stream' length 2102831 bytes (2.0 MB)

downloaded 2.0 MB

trying URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/contrib/4.1/IRanges_2.28.0.tgz' Content type 'application/octet-stream' length 2254035 bytes (2.1 MB)

downloaded 2.1 MB

trying URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/contrib/4.1/XVector_0.34.0.tgz' Content type 'application/octet-stream' length 625655 bytes (610 KB)

downloaded 610 KB

trying URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/contrib/4.1/GenomeInfoDb_1.30.1.tgz' Content type 'application/octet-stream' length 4033747 bytes (3.8 MB)

downloaded 3.8 MB

trying URL 'https://bioconductor.org/packages/3.14/bioc/bin/macosx/contrib/4.1/Biostrings_2.62.0.tgz' Content type 'application/octet-stream' length 14241415 bytes (13.6 MB)

downloaded 13.6 MB

The downloaded binary packages are in /var/folders/lq/x0bth18d2736jxk8k6ywr5cfzm1g/T//RtmpucFjF6/downloaded_packages installing the source package ‘GenomeInfoDbData’

trying URL 'https://bioconductor.org/packages/3.14/data/annotation/src/contrib/GenomeInfoDbData_1.2.7.tar.gz' Content type 'application/octet-stream' length 11193674 bytes (10.7 MB)

downloaded 10.7 MB

The downloaded source packages are in ‘/private/var/folders/lq/x0bth18d2736jxk8k6ywr5cfzm1g/T/RtmpucFjF6/downloaded_packages’ Old packages: 'ade4', 'ape', 'BiocManager', 'broom', 'bslib', 'cachem', 'class', 'classInt', 'codetools', 'conquer', 'curl', 'data.table', 'dbplyr', 'digest', 'dtplyr', 'e1071', 'evaluate', 'fastmap', 'findpython', 'forcats', 'Formula', 'fs', 'gargle', 'ggpubr', 'ggsci', 'haven', 'highr', 'Hmisc', 'htmlwidgets', 'httr', 'igraph', 'knitr', 'lme4', 'lubridate', 'mapproj', 'maptools', 'markdown', 'MASS', 'Matrix', 'mgcv', 'multcomp', 'nlme', 'openssl', 'openxlsx', 'pbkrtest', 'purrr', 'qtl', 'ragg', 'RcppArmadillo', 'readr', 'readxl', 'rgeos', 'rmarkdown', 'rstatix', 's2', 'sass', 'sf', 'sp', 'spatial', 'survival', 'svglite', 'testthat', 'tidyr', 'tidyverse', 'timechange', 'tinytex', 'vdiffr', 'vroom', 'xfun', 'yaml' Update all/some/none? °a/s/n§: a also installing the dependency ‘conflicted’

There are binary versions available but the source versions are later: binary source needs_compilation Matrix 1.5-1 1.5-3 TRUE sf 1.0-10 1.0-11 TRUE

Do you want to install from sources the packages which need compilation? (Yes/no/cancel) no trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/conflicted_1.2.0.tgz' Content type 'application/x-gzip' length 55002 bytes (53 KB)

downloaded 53 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/ade4_1.7-22.tgz' Content type 'application/x-gzip' length 6098183 bytes (5.8 MB)

downloaded 5.8 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/ape_5.7-1.tgz' Content type 'application/x-gzip' length 3480070 bytes (3.3 MB)

downloaded 3.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/BiocManager_1.30.20.tgz' Content type 'application/x-gzip' length 332264 bytes (324 KB)

downloaded 324 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/broom_1.0.4.tgz' Content type 'application/x-gzip' length 1855067 bytes (1.8 MB)

downloaded 1.8 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/bslib_0.4.2.tgz' Content type 'application/x-gzip' length 4805707 bytes (4.6 MB)

downloaded 4.6 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/cachem_1.0.7.tgz' Content type 'application/x-gzip' length 67244 bytes (65 KB)

downloaded 65 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/class_7.3-21.tgz' Content type 'application/x-gzip' length 96360 bytes (94 KB)

downloaded 94 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/classInt_0.4-9.tgz' Content type 'application/x-gzip' length 497586 bytes (485 KB)

downloaded 485 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/codetools_0.2-19.tgz' Content type 'application/x-gzip' length 89175 bytes (87 KB)

downloaded 87 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/conquer_1.3.3.tgz' Content type 'application/x-gzip' length 4481578 bytes (4.3 MB)

downloaded 4.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/curl_5.0.0.tgz' Content type 'application/x-gzip' length 759005 bytes (741 KB)

downloaded 741 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/data.table_1.14.8.tgz' Content type 'application/x-gzip' length 2376122 bytes (2.3 MB)

downloaded 2.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/dbplyr_2.3.1.tgz' Content type 'application/x-gzip' length 1118770 bytes (1.1 MB)

downloaded 1.1 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/digest_0.6.31.tgz' Content type 'application/x-gzip' length 298704 bytes (291 KB)

downloaded 291 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/dtplyr_1.3.0.tgz' Content type 'application/x-gzip' length 352286 bytes (344 KB)

downloaded 344 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/e1071_1.7-13.tgz' Content type 'application/x-gzip' length 660498 bytes (645 KB)

downloaded 645 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/evaluate_0.20.tgz' Content type 'application/x-gzip' length 78402 bytes (76 KB)

downloaded 76 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/fastmap_1.1.1.tgz' Content type 'application/x-gzip' length 201256 bytes (196 KB)

downloaded 196 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/findpython_1.0.8.tgz' Content type 'application/x-gzip' length 20805 bytes (20 KB)

downloaded 20 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/forcats_1.0.0.tgz' Content type 'application/x-gzip' length 421915 bytes (412 KB)

downloaded 412 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/Formula_1.2-5.tgz' Content type 'application/x-gzip' length 158355 bytes (154 KB)

downloaded 154 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/fs_1.6.1.tgz' Content type 'application/x-gzip' length 569688 bytes (556 KB)

downloaded 556 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/gargle_1.3.0.tgz' Content type 'application/x-gzip' length 575093 bytes (561 KB)

downloaded 561 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/ggpubr_0.6.0.tgz' Content type 'application/x-gzip' length 2087714 bytes (2.0 MB)

downloaded 2.0 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/ggsci_3.0.0.tgz' Content type 'application/x-gzip' length 2425732 bytes (2.3 MB)

downloaded 2.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/haven_2.5.2.tgz' Content type 'application/x-gzip' length 1053989 bytes (1.0 MB)

downloaded 1.0 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/highr_0.10.tgz' Content type 'application/x-gzip' length 38968 bytes (38 KB)

downloaded 38 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/Hmisc_5.0-1.tgz' Content type 'application/x-gzip' length 3431280 bytes (3.3 MB)

downloaded 3.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/htmlwidgets_1.6.1.tgz' Content type 'application/x-gzip' length 799960 bytes (781 KB)

downloaded 781 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/httr_1.4.5.tgz' Content type 'application/x-gzip' length 503902 bytes (492 KB)

downloaded 492 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/igraph_1.4.1.tgz' Content type 'application/x-gzip' length 8105698 bytes (7.7 MB)

downloaded 7.7 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/knitr_1.42.tgz' Content type 'application/x-gzip' length 1446737 bytes (1.4 MB)

downloaded 1.4 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/lme4_1.1-32.tgz' Content type 'application/x-gzip' length 7075919 bytes (6.7 MB)

downloaded 6.7 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/lubridate_1.9.2.tgz' Content type 'application/x-gzip' length 976929 bytes (954 KB)

downloaded 954 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/mapproj_1.2.11.tgz' Content type 'application/x-gzip' length 83299 bytes (81 KB)

downloaded 81 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/maptools_1.1-6.tgz' Content type 'application/x-gzip' length 2142168 bytes (2.0 MB)

downloaded 2.0 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/markdown_1.5.tgz' Content type 'application/x-gzip' length 116795 bytes (114 KB)

downloaded 114 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/MASS_7.3-58.3.tgz' Content type 'application/x-gzip' length 1169097 bytes (1.1 MB)

downloaded 1.1 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/Matrix_1.5-1.tgz' Content type 'application/x-gzip' length 5247282 bytes (5.0 MB)

downloaded 5.0 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/mgcv_1.8-42.tgz' Content type 'application/x-gzip' length 3737078 bytes (3.6 MB)

downloaded 3.6 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/multcomp_1.4-23.tgz' Content type 'application/x-gzip' length 737000 bytes (719 KB)

downloaded 719 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/nlme_3.1-162.tgz' Content type 'application/x-gzip' length 2400717 bytes (2.3 MB)

downloaded 2.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/openssl_2.0.6.tgz' Content type 'application/x-gzip' length 2886168 bytes (2.8 MB)

downloaded 2.8 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/openxlsx_4.2.5.2.tgz' Content type 'application/x-gzip' length 3183659 bytes (3.0 MB)

downloaded 3.0 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/pbkrtest_0.5.2.tgz' Content type 'application/x-gzip' length 186926 bytes (182 KB)

downloaded 182 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/purrr_1.0.1.tgz' Content type 'application/x-gzip' length 509931 bytes (497 KB)

downloaded 497 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/qtl_1.58.tgz' Content type 'application/x-gzip' length 6422978 bytes (6.1 MB)

downloaded 6.1 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/ragg_1.2.5.tgz' Content type 'application/x-gzip' length 9223518 bytes (8.8 MB)

downloaded 8.8 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/RcppArmadillo_0.12.0.1.0.tgz' Content type 'application/x-gzip' length 1568674 bytes (1.5 MB)

downloaded 1.5 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/readr_2.1.4.tgz' Content type 'application/x-gzip' length 1852068 bytes (1.8 MB)

downloaded 1.8 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/readxl_1.4.2.tgz' Content type 'application/x-gzip' length 1531295 bytes (1.5 MB)

downloaded 1.5 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/rgeos_0.6-2.tgz' Content type 'application/x-gzip' length 1593878 bytes (1.5 MB)

downloaded 1.5 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/rmarkdown_2.20.tgz' Content type 'application/x-gzip' length 3651543 bytes (3.5 MB)

downloaded 3.5 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/rstatix_0.7.2.tgz' Content type 'application/x-gzip' length 607290 bytes (593 KB)

downloaded 593 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/s2_1.1.2.tgz' Content type 'application/x-gzip' length 12291852 bytes (11.7 MB)

downloaded 11.7 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/sass_0.4.5.tgz' Content type 'application/x-gzip' length 2393486 bytes (2.3 MB)

downloaded 2.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/sf_1.0-10.tgz' Content type 'application/x-gzip' length 89695231 bytes (85.5 MB)

downloaded 85.5 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/sp_1.6-0.tgz' Content type 'application/x-gzip' length 1835510 bytes (1.8 MB)

downloaded 1.8 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/spatial_7.3-16.tgz' Content type 'application/x-gzip' length 157695 bytes (153 KB)

downloaded 153 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/survival_3.5-5.tgz' Content type 'application/x-gzip' length 6768208 bytes (6.5 MB)

downloaded 6.5 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/svglite_2.1.1.tgz' Content type 'application/x-gzip' length 916660 bytes (895 KB)

downloaded 895 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/testthat_3.1.7.tgz' Content type 'application/x-gzip' length 2994422 bytes (2.9 MB)

downloaded 2.9 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/tidyr_1.3.0.tgz' Content type 'application/x-gzip' length 1324625 bytes (1.3 MB)

downloaded 1.3 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/tidyverse_2.0.0.tgz' Content type 'application/x-gzip' length 423020 bytes (413 KB)

downloaded 413 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/timechange_0.2.0.tgz' Content type 'application/x-gzip' length 842408 bytes (822 KB)

downloaded 822 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/tinytex_0.44.tgz' Content type 'application/x-gzip' length 133328 bytes (130 KB)

downloaded 130 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/vdiffr_1.0.5.tgz' Content type 'application/x-gzip' length 1070466 bytes (1.0 MB)

downloaded 1.0 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/vroom_1.6.1.tgz' Content type 'application/x-gzip' length 2741426 bytes (2.6 MB)

downloaded 2.6 MB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/xfun_0.37.tgz' Content type 'application/x-gzip' length 407795 bytes (398 KB)

downloaded 398 KB

trying URL 'https://cloud.r-project.org/bin/macosx/contrib/4.1/yaml_2.3.7.tgz' Content type 'application/x-gzip' length 204951 bytes (200 KB)

downloaded 200 KB

The downloaded binary packages are in /var/folders/lq/x0bth18d2736jxk8k6ywr5cfzm1g/T//RtmpucFjF6/downloaded_packages

library(nahrwhals) Error in library(nahrwhals) : there is no package called ‘nahrwhals’ if (!require("BiocManager", quietly = TRUE))

  • install.packages("BiocManager")

BiocManager::install("Biostrings") Bioconductor version 3.14 (BiocManager 1.30.20), R 4.1.2 (2021-11-01) Old packages: 'Matrix', 'sf' Update all/some/none? °a/s/n§: a

There are binary versions available but the source versions are later: binary source needs_compilation Matrix 1.5-1 1.5-3 TRUE sf 1.0-10 1.0-11 TRUE

Do you want to install from sources the packages which need compilation? (Yes/no/cancel) Yes installing the source packages ‘Matrix’, ‘sf’

trying URL 'https://cloud.r-project.org/src/contrib/Matrix_1.5-3.tar.gz' Content type 'application/x-gzip' length 2163568 bytes (2.1 MB)

downloaded 2.1 MB

trying URL 'https://cloud.r-project.org/src/contrib/sf_1.0-11.tar.gz' Content type 'application/x-gzip' length 3483796 bytes (3.3 MB)

downloaded 3.3 MB

WHops commented 1 year ago

Hi Peter,

this looks like a package dependency problem - i'll have to look into this closer to try to suggest a solution. Meanwhile, thanks for raising this and I'm keeping the issue open, hopefully to be solved quickly.

best Wolfram

pdoris commented 1 year ago

Wolfram

Thanks.

I hope you are able to help me use this tool.

I am interested in looking at structural variation between mammalian genomes we have assembled. The genome assemblies are haploid (inbred rats) and are of high quality. A tool to resolve structural variation fully across the genome or even in regions we know are problematic, would be very helpful.

Peter †††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††† Peter A Doris, Ph.D. Mary Elizabeth Holdsworth Distinguished University Chair in Metabolic and Inflammatory Disease Research Director, Center for Human Genetics Professor of Molecular Medicine Adjunct Professor of Integrative Biology and Pharmacology McGovern Medical School of UTHealth|The University of Texas Health Science Center at Houston

The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases | Center for Human Genetics 1825 Pressler St | Suite 530E | Houston, TX 77030-3725 713 500 2414 tel | 713 500 2447 fax

From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 9:49 AM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)

External: Increase caution when handling links and attachments.

Hi Peter,

this looks like a package dependency problem - i'll have to look into this closer to try to suggest a solution. Meanwhile, thanks for raising this and I'm keeping the issue open, hopefully to be solved quickly.

best Wolfram

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472126036&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C6ef7e0b0b4684a0268b008db262da0fe%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145749656195653%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hA%2BhyHiKsWps43JSr8JeaE3eLYhPRGrbp1VS8GVbaWQ%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHEOS35LNR3MF5OFOOLW4MR7DANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C6ef7e0b0b4684a0268b008db262da0fe%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145749656195653%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iuNJ7s4fglQhsGoqDyDe9CSwEYW0iu85SJrM38i9LNE%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

WHops commented 1 year ago

Hi Peter,

I now streamlined the installation more towards conda, which should remove the errors you saw. Could you please try again with the updated files and instructions?

You'll have to git pull again. It's best if you remove your old nahrwhals environment ("conda remove --name nahrwhals --all").

Would be great if you keep reporting results so I know if the issue can be close. Good luck with your application, happy to assist if you need more help!

best Wolfram

pdoris commented 1 year ago

Thanks Wolfram

I played around a bit until I got the library loaded in R

library(nahrwhals)

quit()

Then I tried the test data…..I just returned from a meeting, so I haven’t had a chance to dig into the output yet or even read the output, see below.

Looks like this ends with an error….

NAHRwhals % Rscript nahrwhals.R --config conf/conf_default.txt

There were 26 warnings (use warnings() to see them)

minimap2_bin = minimap2

bedtools_bin = bedtools

genome_x_fa = testdata/assemblies/hg38_partial.fa

genome_y_fa = testdata/assemblies/assembly_partial.fa

genome_y_fa_mmi = testdata/assemblies/assembly_partial.fa.mmi

anntrack = FALSE

logfile = res/unittest.tsv

samplename_y = Fasta_y

compare_full_fastas = FALSE

seqname_x = chr1_partial

start_x = 1700000

end_x = 3300000

plot_only = FALSE

self_plots = TRUE

plot_xy_segmented = TRUE

eval_th = 98

depth = 3

chunklen = 10000

minlen = 10000

compression = 10000

max_size_col_plus_rows = 250

max_n_alns = 150

use_paf_library = FALSE

conversionpaf_link = FALSE

xpad = 1

plot_minlen = 350

maxlen_refine = 1e+10

n_tests = 10

n_max_testchunks = 5

baseline_log_minsize_min = 8

baseline_log_minsize_max = 17.28771

discovery_exact = FALSE

hltrack = FALSE

hllink = FALSE

aln_pad_factor = 1

debug = FALSE

clean_after_yourself = FALSE

awkscript_fasta = scripts/awk_on_fasta.sh

awkscript_paf = scripts/awk_on_paf.sh

°1§ "Found existing minimap2 index ç".mmiç" file. Skipping re-calculation."

°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_35736522984.2253.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "Attempting to locate input sequence homolog in y assembly... "

°WARNING§ Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.

°M::main::0.013*0.73§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.013*0.74§ mid_occ = 63

°M::mm_idx_stat§ kmer size: 28; skip: 255; is_hpc: 1; £seq: 1

°M::mm_idx_stat::0.013*0.75§ distinct minimizers: 12641 (97.66% are singletons); average occurrences: 1.085; average spacing: 182.349; total length: 2500000

°M::worker_pipeline::0.042*0.92§ mapped 1 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 testdata/assemblies/assembly_partial.fa.mmi res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa

°M::main§ Real time: 0.044 sec; CPU: 0.040 sec; Peak RSS: 0.007 GB

°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_20361913596.2799.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_42879686353.2625.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

rm: res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai: No such file or directory

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...

°M::mm_idx_gen::0.048*1.06§ collected minimizers

°M::mm_idx_gen::0.058*1.53§ sorted minimizers

°M::main::0.058*1.52§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.063*1.49§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.066*1.46§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000

°M::worker_pipeline::0.943*3.32§ mapped 133 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa

°M::main§ Real time: 0.953 sec; CPU: 3.138 sec; Peak RSS: 0.695 GB

°1§ "scripts/awk_on_paf.sh"

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

°1§ "Merging pair 0 out of 166"

°1§ "Merging pair 100 out of 166"

°1§ "PAF compressed to 6212 alignments."

$lift_contig

°1§ "h1tg000011l_partial"

$lift_start

°1§ 726220

$lift_end

°1§ 2050174

$lift_contig

°1§ "h1tg000011l_partial"

$lift_start

°1§ 770239

$lift_end

°1§ 2005740

°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_13232291904.2781.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

rm: res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.fai: No such file or directory

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.fai not found, generating...

Feature (chr1_partial:1700000-3300000:1600000-1600000) has length = 0, Skipping.

°M::mm_idx_gen::0.047*1.07§ collected minimizers

°M::mm_idx_gen::0.055*1.50§ sorted minimizers

°M::main::0.055*1.50§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.060*1.46§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.063*1.44§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000

°M::worker_pipeline::1.141*3.56§ mapped 160 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.chunk.fa

°M::main§ Real time: 1.151 sec; CPU: 4.068 sec; Peak RSS: 0.864 GB

°1§ "scripts/awk_on_paf.sh"

°1§ "Merging pair 0 out of 232"

°1§ "Merging pair 100 out of 232"

°1§ "Merging pair 200 out of 232"

°1§ "PAF compressed to 7851 alignments."

°1§ "4"

Number of alignments: 7851

Number of query sequences: 176

After filtering... Number of alignments: 7851

After filtering... Number of query sequences: 176

°1§ 3300000

°1§ 1700000

°1§ "Chromsosome name unknown. Not attemptying to translate name."

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Coordinate system already present. Adding new coordinate system, which will replace the existing one.

°1§ "5"

°1§ "returning your plot"

°1§ "plot saved."

°1§ "plot saved."

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...

°M::mm_idx_gen::0.037*1.09§ collected minimizers

°M::mm_idx_gen::0.044*1.50§ sorted minimizers

°M::main::0.044*1.50§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.047*1.47§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.049*1.44§ distinct minimizers: 213931 (97.91% are singletons); average occurrences: 1.052; average spacing: 5.488; total length: 1235501

°M::worker_pipeline::0.905*3.24§ mapped 124 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa

°M::main§ Real time: 0.915 sec; CPU: 2.946 sec; Peak RSS: 0.753 GB

°1§ "scripts/awk_on_paf.sh"

°1§ "Merging pair 0 out of 125"

°1§ "Merging pair 100 out of 125"

°1§ "PAF compressed to 6168 alignments."

°1§ "4"

Number of alignments: 6168

Number of query sequences: 118

After filtering... Number of alignments: 6168

After filtering... Number of query sequences: 118

NULL

NULL

°1§ "Chromsosome name unknown. Not attemptying to translate name."

°1§ "5"

°1§ "returning your plot"

°1§ "plot saved."

°1§ "plot saved."

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...

°M::mm_idx_gen::0.050*1.07§ collected minimizers

°M::mm_idx_gen::0.059*1.52§ sorted minimizers

°M::main::0.059*1.52§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.064*1.48§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.068*1.46§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000

°M::worker_pipeline::0.845*3.44§ mapped 124 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa

°M::main§ Real time: 0.855 sec; CPU: 2.917 sec; Peak RSS: 0.557 GB

°1§ "scripts/awk_on_paf.sh"

°1§ "Merging pair 0 out of 155"

°1§ "Merging pair 100 out of 155"

°1§ "PAF compressed to 5701 alignments."

°1§ "4"

Number of alignments: 5701

Number of query sequences: 129

After filtering... Number of alignments: 5701

After filtering... Number of query sequences: 129

°1§ 3300000

°1§ 1700000

°1§ "Chromsosome name unknown. Not attemptying to translate name."

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Coordinate system already present. Adding new coordinate system, which will replace the existing one.

°1§ "5"

°1§ "returning your plot"

°1§ "plot saved."

°1§ "plot saved."

°1§ "4"

°1§ "Minlen/Compression manually chosen. Testing viability"

°1§ "Making the final grid with:"

°1§ "Minlen: 10000"

°1§ "Compression: 10000"

°1§ "Merging pair 0 out of 1"

°1§ "PAF compressed to 5700 alignments."

°1§ "PAF compressed to 6 alignments."

°1§ "Leading to a paf of dimensions: 6"

°1§ "Additional bounce: 1 out of 50"

°1§ "Additional bounce: 2 out of 50"

°1§ "Grid has converged. All fine."

°1§ "Gridline dimensions: 20 and 12"

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Scale for y is already present.

Adding another scale for y, which will replace the existing scale.

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Scale for y is already present.

Adding another scale for y, which will replace the existing scale.

Error in loadNamespace(x) : there is no package called ‘reshape2’

Calls: wrapper_aln_and_analyse ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Execution halted

(nahrwhals) pdorisàPeterDorisOfficeMac NAHRwhals %

From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 3:24 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)

External: Increase caution when handling links and attachments.

Hi Peter,

I now streamlined the installation more towards conda, which should remove the errors you saw. Could you please try again with the updated files and instructions?

You'll have to git pull again. It's best if you remove your old nahrwhals environment ("conda remove --name nahrwhals --all").

Would be great if you keep reporting results so I know if the issue can be close. Good luck with your application, happy to assist if you need more help!

best Wolfram

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472689240&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C8b1330e3c9c446a3790608db265c7bfd%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145950913402357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ub80UO3xxIpPSnXIeKMfDYwh%2BdwkLbjFsO58Qs8i%2BHk%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHCB3TT6LCNZ4UWYE3TW4NZI3ANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C8b1330e3c9c446a3790608db265c7bfd%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145950913402357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nPBCnmzDUeitgMTRmuQUlwgdqjvTmynQsEUdEteHM7g%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

WHops commented 1 year ago

Hi Peter,

Unfortunate this last error, but we're almost there! it should be easy to solve with

R install.packages('reshape2')

I'm pretty confident it will work then. (you should get a 'res' folder with plots and files).

b W

pdoris commented 1 year ago

Actually, I already got a res folder with nice plots!

But installing reshape package anyway

Next question….looks like the test dataset is for 1.6Mb…Is there a reasonable upper limit to place on the target region?

My biggest chromosome is ~270Mb….I have a feeling the output might not be so readily viewable?

Peter

From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 4:16 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)

External: Increase caution when handling links and attachments.

Hi Peter,

Unfortunate this last error, but we're almost there! it should be easy to solve with

R install.packages('reshape2')

I'm pretty confident it will work then. (you should get a 'res' folder with plots and files).

b W

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472751175&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X%2FmqsnD1ytqk7gW20j5GNsXODNw7Volp8t9l2eeiHsY%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHBNC46KJA5S3AB2SC3W4N7JFANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YwE7z0SNPK%2FKM4A3ZA0SIzCb7d7CjPgH9f18m95K8xQ%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

pdoris commented 1 year ago

Latest output, looking at the res folder now….

NAHRwhals % Rscript nahrwhals.R --config conf/conf_default.txt

There were 26 warnings (use warnings() to see them)

minimap2_bin = minimap2

bedtools_bin = bedtools

genome_x_fa = testdata/assemblies/hg38_partial.fa

genome_y_fa = testdata/assemblies/assembly_partial.fa

genome_y_fa_mmi = testdata/assemblies/assembly_partial.fa.mmi

anntrack = FALSE

logfile = res/unittest.tsv

samplename_y = Fasta_y

compare_full_fastas = FALSE

seqname_x = chr1_partial

start_x = 1700000

end_x = 3300000

plot_only = FALSE

self_plots = TRUE

plot_xy_segmented = TRUE

eval_th = 98

depth = 3

chunklen = 10000

minlen = 10000

compression = 10000

max_size_col_plus_rows = 250

max_n_alns = 150

use_paf_library = FALSE

conversionpaf_link = FALSE

xpad = 1

plot_minlen = 350

maxlen_refine = 1e+10

n_tests = 10

n_max_testchunks = 5

baseline_log_minsize_min = 8

baseline_log_minsize_max = 17.28771

discovery_exact = FALSE

hltrack = FALSE

hllink = FALSE

aln_pad_factor = 1

debug = FALSE

clean_after_yourself = FALSE

awkscript_fasta = scripts/awk_on_fasta.sh

awkscript_paf = scripts/awk_on_paf.sh

°1§ "Found existing minimap2 index ç".mmiç" file. Skipping re-calculation."

°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_14929584255.442.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "Attempting to locate input sequence homolog in y assembly... "

°WARNING§ Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.

°M::main::0.010*0.89§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.010*0.90§ mid_occ = 63

°M::mm_idx_stat§ kmer size: 28; skip: 255; is_hpc: 1; £seq: 1

°M::mm_idx_stat::0.011*0.90§ distinct minimizers: 12641 (97.66% are singletons); average occurrences: 1.085; average spacing: 182.349; total length: 2500000

°M::worker_pipeline::0.040*0.97§ mapped 1 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 testdata/assemblies/assembly_partial.fa.mmi res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa

°M::main§ Real time: 0.042 sec; CPU: 0.041 sec; Peak RSS: 0.007 GB

°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_84758845933.2474.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"

°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_67039957179.2223.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...

°M::mm_idx_gen::0.048*1.07§ collected minimizers

°M::mm_idx_gen::0.057*1.51§ sorted minimizers

°M::main::0.057*1.51§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.062*1.47§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.065*1.44§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000

°M::worker_pipeline::0.932*3.37§ mapped 133 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa

°M::main§ Real time: 0.942 sec; CPU: 3.151 sec; Peak RSS: 0.758 GB

°1§ "scripts/awk_on_paf.sh"

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

°1§ "Merging pair 0 out of 166"

°1§ "Merging pair 100 out of 166"

°1§ "PAF compressed to 6212 alignments."

$lift_contig

°1§ "h1tg000011l_partial"

$lift_start

°1§ 726220

$lift_end

°1§ 2050174

$lift_contig

°1§ "h1tg000011l_partial"

$lift_start

°1§ 770239

$lift_end

°1§ 2005740

°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_67487699440.6804.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.fai not found, generating...

Feature (chr1_partial:1700000-3300000:1600000-1600000) has length = 0, Skipping.

°M::mm_idx_gen::0.048*1.07§ collected minimizers

°M::mm_idx_gen::0.057*1.52§ sorted minimizers

°M::main::0.057*1.52§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.061*1.48§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.065*1.46§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000

°M::worker_pipeline::1.182*3.49§ mapped 160 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.chunk.fa

°M::main§ Real time: 1.192 sec; CPU: 4.135 sec; Peak RSS: 0.892 GB

°1§ "scripts/awk_on_paf.sh"

°1§ "Merging pair 0 out of 232"

°1§ "Merging pair 100 out of 232"

°1§ "Merging pair 200 out of 232"

°1§ "PAF compressed to 7851 alignments."

°1§ "4"

Number of alignments: 7851

Number of query sequences: 176

After filtering... Number of alignments: 7851

After filtering... Number of query sequences: 176

°1§ 3300000

°1§ 1700000

°1§ "Chromsosome name unknown. Not attemptying to translate name."

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Coordinate system already present. Adding new coordinate system, which will replace the existing one.

°1§ "5"

°1§ "returning your plot"

°1§ "plot saved."

°1§ "plot saved."

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...

°M::mm_idx_gen::0.038*1.08§ collected minimizers

°M::mm_idx_gen::0.044*1.50§ sorted minimizers

°M::main::0.044*1.50§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.047*1.46§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.050*1.44§ distinct minimizers: 213931 (97.91% are singletons); average occurrences: 1.052; average spacing: 5.488; total length: 1235501

°M::worker_pipeline::0.855*3.38§ mapped 124 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa

°M::main§ Real time: 0.865 sec; CPU: 2.900 sec; Peak RSS: 0.747 GB

°1§ "scripts/awk_on_paf.sh"

°1§ "Merging pair 0 out of 125"

°1§ "Merging pair 100 out of 125"

°1§ "PAF compressed to 6168 alignments."

°1§ "4"

Number of alignments: 6168

Number of query sequences: 118

After filtering... Number of alignments: 6168

After filtering... Number of query sequences: 118

NULL

NULL

°1§ "Chromsosome name unknown. Not attemptying to translate name."

°1§ "5"

°1§ "returning your plot"

°1§ "plot saved."

°1§ "plot saved."

index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...

°M::mm_idx_gen::0.048*1.08§ collected minimizers

°M::mm_idx_gen::0.057*1.54§ sorted minimizers

°M::main::0.057*1.54§ loaded/built the index for 1 target sequence(s)

°M::mm_mapopt_update::0.062*1.50§ mid_occ = 50

°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1

°M::mm_idx_stat::0.065*1.47§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000

°M::worker_pipeline::0.910*3.45§ mapped 124 sequences

°M::main§ Version: 2.24-r1122

°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa

°M::main§ Real time: 0.919 sec; CPU: 3.145 sec; Peak RSS: 0.729 GB

°1§ "scripts/awk_on_paf.sh"

°1§ "Merging pair 0 out of 155"

°1§ "Merging pair 100 out of 155"

°1§ "PAF compressed to 5701 alignments."

°1§ "4"

Number of alignments: 5701

Number of query sequences: 129

After filtering... Number of alignments: 5701

After filtering... Number of query sequences: 129

°1§ 3300000

°1§ 1700000

°1§ "Chromsosome name unknown. Not attemptying to translate name."

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Coordinate system already present. Adding new coordinate system, which will replace the existing one.

°1§ "5"

°1§ "returning your plot"

°1§ "plot saved."

°1§ "plot saved."

°1§ "4"

°1§ "Minlen/Compression manually chosen. Testing viability"

°1§ "Making the final grid with:"

°1§ "Minlen: 10000"

°1§ "Compression: 10000"

°1§ "Merging pair 0 out of 1"

°1§ "PAF compressed to 5700 alignments."

°1§ "PAF compressed to 6 alignments."

°1§ "Leading to a paf of dimensions: 6"

°1§ "Additional bounce: 1 out of 50"

°1§ "Additional bounce: 2 out of 50"

°1§ "Grid has converged. All fine."

°1§ "Gridline dimensions: 20 and 12"

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Scale for y is already present.

Adding another scale for y, which will replace the existing scale.

Scale for x is already present.

Adding another scale for x, which will replace the existing scale.

Scale for y is already present.

Adding another scale for y, which will replace the existing scale.

Using z as value column: use value.var to override.

°1§ 3

°1§ "Running depth layer: 1"

°1§ "Processing branch 1 of 17"

°1§ "Processing branch 2 of 17"

°1§ "Processing branch 3 of 17"

°1§ "Processing branch 4 of 17"

°1§ "Processing branch 5 of 17"

°1§ "Processing branch 6 of 17"

°1§ "Processing branch 7 of 17"

°1§ "Processing branch 8 of 17"

°1§ "Processing branch 9 of 17"

°1§ "Processing branch 10 of 17"

°1§ "Processing branch 11 of 17"

°1§ "Processing branch 12 of 17"

°1§ "Processing branch 13 of 17"

°1§ "Processing branch 14 of 17"

°1§ "Processing branch 15 of 17"

°1§ "Processing branch 16 of 17"

°1§ "Processing branch 17 of 17"

°1§ "Running depth layer: 2"

°1§ "Processing branch 1 of 17"

°1§ "Processing branch 2 of 17"

°1§ "Processing branch 3 of 17"

°1§ "Processing branch 4 of 17"

°1§ "Processing branch 5 of 17"

°1§ "Processing branch 6 of 17"

°1§ "Processing branch 7 of 17"

°1§ "Processing branch 8 of 17"

°1§ "Processing branch 9 of 17"

°1§ "Processing branch 10 of 17"

°1§ "Processing branch 11 of 17"

°1§ "Processing branch 12 of 17"

°1§ "Processing branch 13 of 17"

°1§ "Processing branch 14 of 17"

°1§ "Processing branch 15 of 17"

°1§ "Processing branch 16 of 17"

°1§ "Processing branch 17 of 17"

°1§ "Conclusion found!!"

°1§ "Sorting results"

°1§ "Nodes considered: 161"

°1§ "Eval attempted: 72"

°1§ "Eval calced: 18"

°1§ "Hash excluded: 89"

°1§ "Logfile written."

There were 50 or more warnings (use warnings() to see the first 50)

°1§ "done!"

From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 4:16 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)

External: Increase caution when handling links and attachments.

Hi Peter,

Unfortunate this last error, but we're almost there! it should be easy to solve with

R install.packages('reshape2')

I'm pretty confident it will work then. (you should get a 'res' folder with plots and files).

b W

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472751175&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X%2FmqsnD1ytqk7gW20j5GNsXODNw7Volp8t9l2eeiHsY%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHBNC46KJA5S3AB2SC3W4N7JFANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YwE7z0SNPK%2FKM4A3ZA0SIzCb7d7CjPgH9f18m95K8xQ%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

WHops commented 1 year ago

Great to hear that! 360 Mb is indeed too large to process reasonably. You should not exceed a region of 5 Mbp typically, and avoid centromeres as these will be computationally very heavy. SV calling is meant to be focussed on a specific (complex) region at a time. Also word of caution, if the input region on Ref is split into 2 or more contigs on Alt, only the longest matching contig is used for visualization.

WHops commented 1 year ago

As the installation questions seem to be resolved, I am now closing this issue.

pdoris commented 1 year ago

Thanks Wolfram….I will probably start with the rat Immunoglobulin heavy chain….the locus is 6Mb in the reference, but we have one assembly in which it is >10Mb due to duplication…Indeed, although these are ally good assemblies made from 35X PacBio, Hi-C and Bionano, the structural variation in this locus is so great that I even doubt that the assembly is correct, though I don’t doubt that the extent of duplication is correct….

Rat (and mouse) also have some super expanded regions of the genome that contain genes involved in sex chromosome competition. We have one such locus of 15Mb that has at least 700 copies of a single gene family…expressed only in testis…

We should have an interesting test of the sSV detection capability.

I will share what we find (I am certain I will need you to help me decipher it!)

OK, on to the config file!

Peter

From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 4:33 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)

External: Increase caution when handling links and attachments.

Great to hear that! 360 Mb is indeed too large to process reasonably. You should not exceed a region of 5 Mbp typically, and avoid centromeres as these will be computationally very heavy. SV calling is meant to be focussed on a specific (complex) region at a time. Also word of caution, if the input region on Ref is split into 2 or more contigs on Alt, only the longest matching contig is used for visualization.

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472773405&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cb3c855be76e5462e8cbf08db26660609%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145991878706096%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=datDlDBnfT2j7rV7u32hXUylCEzHj09dqmxi11ctZB4%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHDYIYM5SDJS5VVTMFLW4OBI5ANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cb3c855be76e5462e8cbf08db26660609%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145991878706096%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0SGgCIXNQVLboG4wUFC4%2B5vDzacZ%2FawcX2nYrP8vtnU%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

WHops commented 1 year ago

Hi Peter,

best of luck with this analysis, and, as indicated, feel free to approach me when you have technical questions regarding the tool! With 6 or 10 Mbp you are operating at the outer edge of what the tool was designed for, but i believe after examining some plots you will get a good idea of what is going on and can focus on sub-areas if needed.

best Wolfram