Closed pdoris closed 1 year ago
Thanks for raising the issue! Can you try if installing Biostrings manually solves the issue? This should be possible from within R with:
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("Biostrings")
Wolfram
Sorry, this is rather verbose, but ends….> library(nahrwhals) Error in library(nahrwhals) : there is no package called ‘nahrwhals’
Peter
Error in library(nahrwhals) : there is no package called ‘nahrwhals’
+++++++++++++++++++++++++++
if (!require("BiocManager", quietly = TRUE))
- install.packages("BiocManager") Bioconductor version 3.14 (BiocManager 1.30.19), R 4.1.2 (2021-11-01) Bioconductor version '3.14' is out-of-date; the current release version '3.16' is available with R version '4.2'; see https://bioconductor.org/install
BiocManager::install("Biostrings") Bioconductor version 3.14 (BiocManager 1.30.19), R 4.1.2 (2021-11-01) Installing package(s) 'Biostrings' also installing the dependencies ‘bitops’, ‘zlibbioc’, ‘RCurl’, ‘GenomeInfoDbData’, ‘BiocGenerics’, ‘S4Vectors’, ‘IRanges’, ‘XVector’, ‘GenomeInfoDb’
downloaded 28 KB
downloaded 122 KB
downloaded 1.0 MB
downloaded 576 KB
downloaded 2.0 MB
downloaded 2.1 MB
downloaded 610 KB
downloaded 3.8 MB
downloaded 13.6 MB
The downloaded binary packages are in /var/folders/lq/x0bth18d2736jxk8k6ywr5cfzm1g/T//RtmpucFjF6/downloaded_packages installing the source package ‘GenomeInfoDbData’
downloaded 10.7 MB
The downloaded source packages are in ‘/private/var/folders/lq/x0bth18d2736jxk8k6ywr5cfzm1g/T/RtmpucFjF6/downloaded_packages’ Old packages: 'ade4', 'ape', 'BiocManager', 'broom', 'bslib', 'cachem', 'class', 'classInt', 'codetools', 'conquer', 'curl', 'data.table', 'dbplyr', 'digest', 'dtplyr', 'e1071', 'evaluate', 'fastmap', 'findpython', 'forcats', 'Formula', 'fs', 'gargle', 'ggpubr', 'ggsci', 'haven', 'highr', 'Hmisc', 'htmlwidgets', 'httr', 'igraph', 'knitr', 'lme4', 'lubridate', 'mapproj', 'maptools', 'markdown', 'MASS', 'Matrix', 'mgcv', 'multcomp', 'nlme', 'openssl', 'openxlsx', 'pbkrtest', 'purrr', 'qtl', 'ragg', 'RcppArmadillo', 'readr', 'readxl', 'rgeos', 'rmarkdown', 'rstatix', 's2', 'sass', 'sf', 'sp', 'spatial', 'survival', 'svglite', 'testthat', 'tidyr', 'tidyverse', 'timechange', 'tinytex', 'vdiffr', 'vroom', 'xfun', 'yaml' Update all/some/none? °a/s/n§: a also installing the dependency ‘conflicted’
There are binary versions available but the source versions are later: binary source needs_compilation Matrix 1.5-1 1.5-3 TRUE sf 1.0-10 1.0-11 TRUE
downloaded 53 KB
downloaded 5.8 MB
downloaded 3.3 MB
downloaded 324 KB
downloaded 1.8 MB
downloaded 4.6 MB
downloaded 65 KB
downloaded 94 KB
downloaded 485 KB
downloaded 87 KB
downloaded 4.3 MB
downloaded 741 KB
downloaded 2.3 MB
downloaded 1.1 MB
downloaded 291 KB
downloaded 344 KB
downloaded 645 KB
downloaded 76 KB
downloaded 196 KB
downloaded 20 KB
downloaded 412 KB
downloaded 154 KB
downloaded 556 KB
downloaded 561 KB
downloaded 2.0 MB
downloaded 2.3 MB
downloaded 1.0 MB
downloaded 38 KB
downloaded 3.3 MB
downloaded 781 KB
downloaded 492 KB
downloaded 7.7 MB
downloaded 1.4 MB
downloaded 6.7 MB
downloaded 954 KB
downloaded 81 KB
downloaded 2.0 MB
downloaded 114 KB
downloaded 1.1 MB
downloaded 5.0 MB
downloaded 3.6 MB
downloaded 719 KB
downloaded 2.3 MB
downloaded 2.8 MB
downloaded 3.0 MB
downloaded 182 KB
downloaded 497 KB
downloaded 6.1 MB
downloaded 8.8 MB
downloaded 1.5 MB
downloaded 1.8 MB
downloaded 1.5 MB
downloaded 1.5 MB
downloaded 3.5 MB
downloaded 593 KB
downloaded 11.7 MB
downloaded 2.3 MB
downloaded 85.5 MB
downloaded 1.8 MB
downloaded 153 KB
downloaded 6.5 MB
downloaded 895 KB
downloaded 2.9 MB
downloaded 1.3 MB
downloaded 413 KB
downloaded 822 KB
downloaded 130 KB
downloaded 1.0 MB
downloaded 2.6 MB
downloaded 398 KB
downloaded 200 KB
The downloaded binary packages are in /var/folders/lq/x0bth18d2736jxk8k6ywr5cfzm1g/T//RtmpucFjF6/downloaded_packages
library(nahrwhals) Error in library(nahrwhals) : there is no package called ‘nahrwhals’ if (!require("BiocManager", quietly = TRUE))
- install.packages("BiocManager")
BiocManager::install("Biostrings") Bioconductor version 3.14 (BiocManager 1.30.20), R 4.1.2 (2021-11-01) Old packages: 'Matrix', 'sf' Update all/some/none? °a/s/n§: a
There are binary versions available but the source versions are later: binary source needs_compilation Matrix 1.5-1 1.5-3 TRUE sf 1.0-10 1.0-11 TRUE
Do you want to install from sources the packages which need compilation? (Yes/no/cancel) Yes installing the source packages ‘Matrix’, ‘sf’
downloaded 2.1 MB
downloaded 3.3 MB
Hi Peter,
this looks like a package dependency problem - i'll have to look into this closer to try to suggest a solution. Meanwhile, thanks for raising this and I'm keeping the issue open, hopefully to be solved quickly.
best Wolfram
Wolfram
Thanks.
I hope you are able to help me use this tool.
I am interested in looking at structural variation between mammalian genomes we have assembled. The genome assemblies are haploid (inbred rats) and are of high quality. A tool to resolve structural variation fully across the genome or even in regions we know are problematic, would be very helpful.
Peter †††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††† Peter A Doris, Ph.D. Mary Elizabeth Holdsworth Distinguished University Chair in Metabolic and Inflammatory Disease Research Director, Center for Human Genetics Professor of Molecular Medicine Adjunct Professor of Integrative Biology and Pharmacology McGovern Medical School of UTHealth|The University of Texas Health Science Center at Houston
The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases | Center for Human Genetics 1825 Pressler St | Suite 530E | Houston, TX 77030-3725 713 500 2414 tel | 713 500 2447 fax
From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 9:49 AM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)
External: Increase caution when handling links and attachments.
Hi Peter,
this looks like a package dependency problem - i'll have to look into this closer to try to suggest a solution. Meanwhile, thanks for raising this and I'm keeping the issue open, hopefully to be solved quickly.
best Wolfram
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472126036&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C6ef7e0b0b4684a0268b008db262da0fe%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145749656195653%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hA%2BhyHiKsWps43JSr8JeaE3eLYhPRGrbp1VS8GVbaWQ%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHEOS35LNR3MF5OFOOLW4MR7DANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C6ef7e0b0b4684a0268b008db262da0fe%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145749656195653%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iuNJ7s4fglQhsGoqDyDe9CSwEYW0iu85SJrM38i9LNE%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Hi Peter,
I now streamlined the installation more towards conda, which should remove the errors you saw. Could you please try again with the updated files and instructions?
You'll have to git pull again. It's best if you remove your old nahrwhals environment ("conda remove --name nahrwhals --all").
Would be great if you keep reporting results so I know if the issue can be close. Good luck with your application, happy to assist if you need more help!
best Wolfram
Thanks Wolfram
I played around a bit until I got the library loaded in R
library(nahrwhals)
quit()
Then I tried the test data…..I just returned from a meeting, so I haven’t had a chance to dig into the output yet or even read the output, see below.
Looks like this ends with an error….
NAHRwhals % Rscript nahrwhals.R --config conf/conf_default.txt
There were 26 warnings (use warnings() to see them)
minimap2_bin = minimap2
bedtools_bin = bedtools
genome_x_fa = testdata/assemblies/hg38_partial.fa
genome_y_fa = testdata/assemblies/assembly_partial.fa
genome_y_fa_mmi = testdata/assemblies/assembly_partial.fa.mmi
anntrack = FALSE
logfile = res/unittest.tsv
samplename_y = Fasta_y
compare_full_fastas = FALSE
seqname_x = chr1_partial
start_x = 1700000
end_x = 3300000
plot_only = FALSE
self_plots = TRUE
plot_xy_segmented = TRUE
eval_th = 98
depth = 3
chunklen = 10000
minlen = 10000
compression = 10000
max_size_col_plus_rows = 250
max_n_alns = 150
use_paf_library = FALSE
conversionpaf_link = FALSE
xpad = 1
plot_minlen = 350
maxlen_refine = 1e+10
n_tests = 10
n_max_testchunks = 5
baseline_log_minsize_min = 8
baseline_log_minsize_max = 17.28771
discovery_exact = FALSE
hltrack = FALSE
hllink = FALSE
aln_pad_factor = 1
debug = FALSE
clean_after_yourself = FALSE
awkscript_fasta = scripts/awk_on_fasta.sh
awkscript_paf = scripts/awk_on_paf.sh
°1§ "Found existing minimap2 index ç".mmiç" file. Skipping re-calculation."
°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_35736522984.2253.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "Attempting to locate input sequence homolog in y assembly... "
°WARNING§ Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.
°M::main::0.013*0.73§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.013*0.74§ mid_occ = 63
°M::mm_idx_stat§ kmer size: 28; skip: 255; is_hpc: 1; £seq: 1
°M::mm_idx_stat::0.013*0.75§ distinct minimizers: 12641 (97.66% are singletons); average occurrences: 1.085; average spacing: 182.349; total length: 2500000
°M::worker_pipeline::0.042*0.92§ mapped 1 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 testdata/assemblies/assembly_partial.fa.mmi res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa
°M::main§ Real time: 0.044 sec; CPU: 0.040 sec; Peak RSS: 0.007 GB
°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_20361913596.2799.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_42879686353.2625.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
rm: res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai: No such file or directory
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...
°M::mm_idx_gen::0.048*1.06§ collected minimizers
°M::mm_idx_gen::0.058*1.53§ sorted minimizers
°M::main::0.058*1.52§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.063*1.49§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.066*1.46§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000
°M::worker_pipeline::0.943*3.32§ mapped 133 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa
°M::main§ Real time: 0.953 sec; CPU: 3.138 sec; Peak RSS: 0.695 GB
°1§ "scripts/awk_on_paf.sh"
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
°1§ "Merging pair 0 out of 166"
°1§ "Merging pair 100 out of 166"
°1§ "PAF compressed to 6212 alignments."
$lift_contig
°1§ "h1tg000011l_partial"
$lift_start
°1§ 726220
$lift_end
°1§ 2050174
$lift_contig
°1§ "h1tg000011l_partial"
$lift_start
°1§ 770239
$lift_end
°1§ 2005740
°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_13232291904.2781.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
rm: res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.fai: No such file or directory
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.fai not found, generating...
Feature (chr1_partial:1700000-3300000:1600000-1600000) has length = 0, Skipping.
°M::mm_idx_gen::0.047*1.07§ collected minimizers
°M::mm_idx_gen::0.055*1.50§ sorted minimizers
°M::main::0.055*1.50§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.060*1.46§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.063*1.44§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000
°M::worker_pipeline::1.141*3.56§ mapped 160 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.chunk.fa
°M::main§ Real time: 1.151 sec; CPU: 4.068 sec; Peak RSS: 0.864 GB
°1§ "scripts/awk_on_paf.sh"
°1§ "Merging pair 0 out of 232"
°1§ "Merging pair 100 out of 232"
°1§ "Merging pair 200 out of 232"
°1§ "PAF compressed to 7851 alignments."
°1§ "4"
Number of alignments: 7851
Number of query sequences: 176
After filtering... Number of alignments: 7851
After filtering... Number of query sequences: 176
°1§ 3300000
°1§ 1700000
°1§ "Chromsosome name unknown. Not attemptying to translate name."
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Coordinate system already present. Adding new coordinate system, which will replace the existing one.
°1§ "5"
°1§ "returning your plot"
°1§ "plot saved."
°1§ "plot saved."
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...
°M::mm_idx_gen::0.037*1.09§ collected minimizers
°M::mm_idx_gen::0.044*1.50§ sorted minimizers
°M::main::0.044*1.50§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.047*1.47§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.049*1.44§ distinct minimizers: 213931 (97.91% are singletons); average occurrences: 1.052; average spacing: 5.488; total length: 1235501
°M::worker_pipeline::0.905*3.24§ mapped 124 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa
°M::main§ Real time: 0.915 sec; CPU: 2.946 sec; Peak RSS: 0.753 GB
°1§ "scripts/awk_on_paf.sh"
°1§ "Merging pair 0 out of 125"
°1§ "Merging pair 100 out of 125"
°1§ "PAF compressed to 6168 alignments."
°1§ "4"
Number of alignments: 6168
Number of query sequences: 118
After filtering... Number of alignments: 6168
After filtering... Number of query sequences: 118
NULL
NULL
°1§ "Chromsosome name unknown. Not attemptying to translate name."
°1§ "5"
°1§ "returning your plot"
°1§ "plot saved."
°1§ "plot saved."
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...
°M::mm_idx_gen::0.050*1.07§ collected minimizers
°M::mm_idx_gen::0.059*1.52§ sorted minimizers
°M::main::0.059*1.52§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.064*1.48§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.068*1.46§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000
°M::worker_pipeline::0.845*3.44§ mapped 124 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa
°M::main§ Real time: 0.855 sec; CPU: 2.917 sec; Peak RSS: 0.557 GB
°1§ "scripts/awk_on_paf.sh"
°1§ "Merging pair 0 out of 155"
°1§ "Merging pair 100 out of 155"
°1§ "PAF compressed to 5701 alignments."
°1§ "4"
Number of alignments: 5701
Number of query sequences: 129
After filtering... Number of alignments: 5701
After filtering... Number of query sequences: 129
°1§ 3300000
°1§ 1700000
°1§ "Chromsosome name unknown. Not attemptying to translate name."
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Coordinate system already present. Adding new coordinate system, which will replace the existing one.
°1§ "5"
°1§ "returning your plot"
°1§ "plot saved."
°1§ "plot saved."
°1§ "4"
°1§ "Minlen/Compression manually chosen. Testing viability"
°1§ "Making the final grid with:"
°1§ "Minlen: 10000"
°1§ "Compression: 10000"
°1§ "Merging pair 0 out of 1"
°1§ "PAF compressed to 5700 alignments."
°1§ "PAF compressed to 6 alignments."
°1§ "Leading to a paf of dimensions: 6"
°1§ "Additional bounce: 1 out of 50"
°1§ "Additional bounce: 2 out of 50"
°1§ "Grid has converged. All fine."
°1§ "Gridline dimensions: 20 and 12"
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Error in loadNamespace(x) : there is no package called ‘reshape2’
Calls: wrapper_aln_and_analyse ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
(nahrwhals) pdorisàPeterDorisOfficeMac NAHRwhals %
From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 3:24 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)
External: Increase caution when handling links and attachments.
Hi Peter,
I now streamlined the installation more towards conda, which should remove the errors you saw. Could you please try again with the updated files and instructions?
You'll have to git pull again. It's best if you remove your old nahrwhals environment ("conda remove --name nahrwhals --all").
Would be great if you keep reporting results so I know if the issue can be close. Good luck with your application, happy to assist if you need more help!
best Wolfram
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472689240&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C8b1330e3c9c446a3790608db265c7bfd%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145950913402357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ub80UO3xxIpPSnXIeKMfDYwh%2BdwkLbjFsO58Qs8i%2BHk%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHCB3TT6LCNZ4UWYE3TW4NZI3ANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7C8b1330e3c9c446a3790608db265c7bfd%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145950913402357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nPBCnmzDUeitgMTRmuQUlwgdqjvTmynQsEUdEteHM7g%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Hi Peter,
Unfortunate this last error, but we're almost there! it should be easy to solve with
R install.packages('reshape2')
I'm pretty confident it will work then. (you should get a 'res' folder with plots and files).
b W
Actually, I already got a res folder with nice plots!
But installing reshape package anyway
Next question….looks like the test dataset is for 1.6Mb…Is there a reasonable upper limit to place on the target region?
My biggest chromosome is ~270Mb….I have a feeling the output might not be so readily viewable?
Peter
From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 4:16 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)
External: Increase caution when handling links and attachments.
Hi Peter,
Unfortunate this last error, but we're almost there! it should be easy to solve with
R install.packages('reshape2')
I'm pretty confident it will work then. (you should get a 'res' folder with plots and files).
b W
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472751175&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X%2FmqsnD1ytqk7gW20j5GNsXODNw7Volp8t9l2eeiHsY%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHBNC46KJA5S3AB2SC3W4N7JFANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YwE7z0SNPK%2FKM4A3ZA0SIzCb7d7CjPgH9f18m95K8xQ%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Latest output, looking at the res folder now….
NAHRwhals % Rscript nahrwhals.R --config conf/conf_default.txt
There were 26 warnings (use warnings() to see them)
minimap2_bin = minimap2
bedtools_bin = bedtools
genome_x_fa = testdata/assemblies/hg38_partial.fa
genome_y_fa = testdata/assemblies/assembly_partial.fa
genome_y_fa_mmi = testdata/assemblies/assembly_partial.fa.mmi
anntrack = FALSE
logfile = res/unittest.tsv
samplename_y = Fasta_y
compare_full_fastas = FALSE
seqname_x = chr1_partial
start_x = 1700000
end_x = 3300000
plot_only = FALSE
self_plots = TRUE
plot_xy_segmented = TRUE
eval_th = 98
depth = 3
chunklen = 10000
minlen = 10000
compression = 10000
max_size_col_plus_rows = 250
max_n_alns = 150
use_paf_library = FALSE
conversionpaf_link = FALSE
xpad = 1
plot_minlen = 350
maxlen_refine = 1e+10
n_tests = 10
n_max_testchunks = 5
baseline_log_minsize_min = 8
baseline_log_minsize_max = 17.28771
discovery_exact = FALSE
hltrack = FALSE
hllink = FALSE
aln_pad_factor = 1
debug = FALSE
clean_after_yourself = FALSE
awkscript_fasta = scripts/awk_on_fasta.sh
awkscript_paf = scripts/awk_on_paf.sh
°1§ "Found existing minimap2 index ç".mmiç" file. Skipping re-calculation."
°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_14929584255.442.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "Attempting to locate input sequence homolog in y assembly... "
°WARNING§ Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index.
°M::main::0.010*0.89§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.010*0.90§ mid_occ = 63
°M::mm_idx_stat§ kmer size: 28; skip: 255; is_hpc: 1; £seq: 1
°M::mm_idx_stat::0.011*0.90§ distinct minimizers: 12641 (97.66% are singletons); average occurrences: 1.085; average spacing: 182.349; total length: 2500000
°M::worker_pipeline::0.040*0.97§ mapped 1 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 testdata/assemblies/assembly_partial.fa.mmi res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa
°M::main§ Real time: 0.042 sec; CPU: 0.041 sec; Peak RSS: 0.007 GB
°1§ "bedtools getfasta -fi testdata/assemblies/hg38_partial.fa -bed region2_84758845933.2474.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa"
°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_67039957179.2223.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...
°M::mm_idx_gen::0.048*1.07§ collected minimizers
°M::mm_idx_gen::0.057*1.51§ sorted minimizers
°M::main::0.057*1.51§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.062*1.47§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.065*1.44§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000
°M::worker_pipeline::0.932*3.37§ mapped 133 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa
°M::main§ Real time: 0.942 sec; CPU: 3.151 sec; Peak RSS: 0.758 GB
°1§ "scripts/awk_on_paf.sh"
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
°1§ "Merging pair 0 out of 166"
°1§ "Merging pair 100 out of 166"
°1§ "PAF compressed to 6212 alignments."
$lift_contig
°1§ "h1tg000011l_partial"
$lift_start
°1§ 726220
$lift_end
°1§ 2050174
$lift_contig
°1§ "h1tg000011l_partial"
$lift_start
°1§ 770239
$lift_end
°1§ 2005740
°1§ "bedtools getfasta -fi testdata/assemblies/assembly_partial.fa -bed region2_67487699440.6804.bed > res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
°1§ "Subsequence extracted and saved to res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa"
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.fai not found, generating...
Feature (chr1_partial:1700000-3300000:1600000-1600000) has length = 0, Skipping.
°M::mm_idx_gen::0.048*1.07§ collected minimizers
°M::mm_idx_gen::0.057*1.52§ sorted minimizers
°M::main::0.057*1.52§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.061*1.48§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.065*1.46§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000
°M::worker_pipeline::1.182*3.49§ mapped 160 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa.chunk.fa
°M::main§ Real time: 1.192 sec; CPU: 4.135 sec; Peak RSS: 0.892 GB
°1§ "scripts/awk_on_paf.sh"
°1§ "Merging pair 0 out of 232"
°1§ "Merging pair 100 out of 232"
°1§ "Merging pair 200 out of 232"
°1§ "PAF compressed to 7851 alignments."
°1§ "4"
Number of alignments: 7851
Number of query sequences: 176
After filtering... Number of alignments: 7851
After filtering... Number of query sequences: 176
°1§ 3300000
°1§ 1700000
°1§ "Chromsosome name unknown. Not attemptying to translate name."
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Coordinate system already present. Adding new coordinate system, which will replace the existing one.
°1§ "5"
°1§ "returning your plot"
°1§ "plot saved."
°1§ "plot saved."
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...
°M::mm_idx_gen::0.038*1.08§ collected minimizers
°M::mm_idx_gen::0.044*1.50§ sorted minimizers
°M::main::0.044*1.50§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.047*1.46§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.050*1.44§ distinct minimizers: 213931 (97.91% are singletons); average occurrences: 1.052; average spacing: 5.488; total length: 1235501
°M::worker_pipeline::0.855*3.38§ mapped 124 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa
°M::main§ Real time: 0.865 sec; CPU: 2.900 sec; Peak RSS: 0.747 GB
°1§ "scripts/awk_on_paf.sh"
°1§ "Merging pair 0 out of 125"
°1§ "Merging pair 100 out of 125"
°1§ "PAF compressed to 6168 alignments."
°1§ "4"
Number of alignments: 6168
Number of query sequences: 118
After filtering... Number of alignments: 6168
After filtering... Number of query sequences: 118
NULL
NULL
°1§ "Chromsosome name unknown. Not attemptying to translate name."
°1§ "5"
°1§ "returning your plot"
°1§ "plot saved."
°1§ "plot saved."
index file res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.fai not found, generating...
°M::mm_idx_gen::0.048*1.08§ collected minimizers
°M::mm_idx_gen::0.057*1.54§ sorted minimizers
°M::main::0.057*1.54§ loaded/built the index for 1 target sequence(s)
°M::mm_mapopt_update::0.062*1.50§ mid_occ = 50
°M::mm_idx_stat§ kmer size: 19; skip: 10; is_hpc: 0; £seq: 1
°M::mm_idx_stat::0.065*1.47§ distinct minimizers: 229779 (79.31% are singletons); average occurrences: 1.269; average spacing: 5.487; total length: 1600000
°M::worker_pipeline::0.910*3.45§ mapped 124 sequences
°M::main§ Version: 2.24-r1122
°M::main§ CMD: minimap2 -x asm20 -P -c -s 0 -M 0.2 -t 4 res/chr1_partial-1700000-3300000/fasta/Fasta_y_x.fa res/chr1_partial-1700000-3300000/fasta/Fasta_y_y.fa.chunk.fa
°M::main§ Real time: 0.919 sec; CPU: 3.145 sec; Peak RSS: 0.729 GB
°1§ "scripts/awk_on_paf.sh"
°1§ "Merging pair 0 out of 155"
°1§ "Merging pair 100 out of 155"
°1§ "PAF compressed to 5701 alignments."
°1§ "4"
Number of alignments: 5701
Number of query sequences: 129
After filtering... Number of alignments: 5701
After filtering... Number of query sequences: 129
°1§ 3300000
°1§ 1700000
°1§ "Chromsosome name unknown. Not attemptying to translate name."
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Coordinate system already present. Adding new coordinate system, which will replace the existing one.
°1§ "5"
°1§ "returning your plot"
°1§ "plot saved."
°1§ "plot saved."
°1§ "4"
°1§ "Minlen/Compression manually chosen. Testing viability"
°1§ "Making the final grid with:"
°1§ "Minlen: 10000"
°1§ "Compression: 10000"
°1§ "Merging pair 0 out of 1"
°1§ "PAF compressed to 5700 alignments."
°1§ "PAF compressed to 6 alignments."
°1§ "Leading to a paf of dimensions: 6"
°1§ "Additional bounce: 1 out of 50"
°1§ "Additional bounce: 2 out of 50"
°1§ "Grid has converged. All fine."
°1§ "Gridline dimensions: 20 and 12"
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using z as value column: use value.var to override.
°1§ 3
°1§ "Running depth layer: 1"
°1§ "Processing branch 1 of 17"
°1§ "Processing branch 2 of 17"
°1§ "Processing branch 3 of 17"
°1§ "Processing branch 4 of 17"
°1§ "Processing branch 5 of 17"
°1§ "Processing branch 6 of 17"
°1§ "Processing branch 7 of 17"
°1§ "Processing branch 8 of 17"
°1§ "Processing branch 9 of 17"
°1§ "Processing branch 10 of 17"
°1§ "Processing branch 11 of 17"
°1§ "Processing branch 12 of 17"
°1§ "Processing branch 13 of 17"
°1§ "Processing branch 14 of 17"
°1§ "Processing branch 15 of 17"
°1§ "Processing branch 16 of 17"
°1§ "Processing branch 17 of 17"
°1§ "Running depth layer: 2"
°1§ "Processing branch 1 of 17"
°1§ "Processing branch 2 of 17"
°1§ "Processing branch 3 of 17"
°1§ "Processing branch 4 of 17"
°1§ "Processing branch 5 of 17"
°1§ "Processing branch 6 of 17"
°1§ "Processing branch 7 of 17"
°1§ "Processing branch 8 of 17"
°1§ "Processing branch 9 of 17"
°1§ "Processing branch 10 of 17"
°1§ "Processing branch 11 of 17"
°1§ "Processing branch 12 of 17"
°1§ "Processing branch 13 of 17"
°1§ "Processing branch 14 of 17"
°1§ "Processing branch 15 of 17"
°1§ "Processing branch 16 of 17"
°1§ "Processing branch 17 of 17"
°1§ "Conclusion found!!"
°1§ "Sorting results"
°1§ "Nodes considered: 161"
°1§ "Eval attempted: 72"
°1§ "Eval calced: 18"
°1§ "Hash excluded: 89"
°1§ "Logfile written."
There were 50 or more warnings (use warnings() to see the first 50)
°1§ "done!"
From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 4:16 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)
External: Increase caution when handling links and attachments.
Hi Peter,
Unfortunate this last error, but we're almost there! it should be easy to solve with
R install.packages('reshape2')
I'm pretty confident it will work then. (you should get a 'res' folder with plots and files).
b W
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472751175&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X%2FmqsnD1ytqk7gW20j5GNsXODNw7Volp8t9l2eeiHsY%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHBNC46KJA5S3AB2SC3W4N7JFANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cf40440804e0b4548182708db2663a5a2%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145981641548590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YwE7z0SNPK%2FKM4A3ZA0SIzCb7d7CjPgH9f18m95K8xQ%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Great to hear that! 360 Mb is indeed too large to process reasonably. You should not exceed a region of 5 Mbp typically, and avoid centromeres as these will be computationally very heavy. SV calling is meant to be focussed on a specific (complex) region at a time. Also word of caution, if the input region on Ref is split into 2 or more contigs on Alt, only the longest matching contig is used for visualization.
As the installation questions seem to be resolved, I am now closing this issue.
Thanks Wolfram….I will probably start with the rat Immunoglobulin heavy chain….the locus is 6Mb in the reference, but we have one assembly in which it is >10Mb due to duplication…Indeed, although these are ally good assemblies made from 35X PacBio, Hi-C and Bionano, the structural variation in this locus is so great that I even doubt that the assembly is correct, though I don’t doubt that the extent of duplication is correct….
Rat (and mouse) also have some super expanded regions of the genome that contain genes involved in sex chromosome competition. We have one such locus of 15Mb that has at least 700 copies of a single gene family…expressed only in testis…
We should have an interesting test of the sSV detection capability.
I will share what we find (I am certain I will need you to help me decipher it!)
OK, on to the config file!
Peter
From: Wolfram Höps @.> Reply-To: WHops/NAHRwhals @.> Date: Thursday, March 16, 2023 at 4:33 PM To: WHops/NAHRwhals @.> Cc: "Doris, Peter A" @.>, Author @.***> Subject: Re: [WHops/NAHRwhals] Biostrings (Issue #3)
External: Increase caution when handling links and attachments.
Great to hear that! 360 Mb is indeed too large to process reasonably. You should not exceed a region of 5 Mbp typically, and avoid centromeres as these will be computationally very heavy. SV calling is meant to be focussed on a specific (complex) region at a time. Also word of caution, if the input region on Ref is split into 2 or more contigs on Alt, only the longest matching contig is used for visualization.
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FWHops%2FNAHRwhals%2Fissues%2F3%23issuecomment-1472773405&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cb3c855be76e5462e8cbf08db26660609%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145991878706096%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=datDlDBnfT2j7rV7u32hXUylCEzHj09dqmxi11ctZB4%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUEQOHDYIYM5SDJS5VVTMFLW4OBI5ANCNFSM6AAAAAAV5IWWX4&data=05%7C01%7Cpeter.a.doris%40uth.tmc.edu%7Cb3c855be76e5462e8cbf08db26660609%7C7b326d2441ad4f57bc6089e4a6ac721b%7C0%7C0%7C638145991878706096%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0SGgCIXNQVLboG4wUFC4%2B5vDzacZ%2FawcX2nYrP8vtnU%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Hi Peter,
best of luck with this analysis, and, as indicated, feel free to approach me when you have technical questions regarding the tool! With 6 or 10 Mbp you are operating at the outer edge of what the tool was designed for, but i believe after examining some plots you will get a good idea of what is going on and can focus on sub-areas if needed.
best Wolfram
Biostrings not available in package
NAHRwhals % Rscript install_package.R Loading required package: devtools Loading required package: usethis Loading required package: argparse
Skipping 1 packages not available: Biostrings