Is not doing what is intended. The code is likely meant to be one of the following:
merge(seg, bins[, .N, by = chrom][, .(chrom, N = c(0, cumsum(N))[1:(.N)])], by = "chrom")
merge(seg, bins[, .N, by = chrom][, .(chrom, N = c(0, cumsum(N)[1:(.N - 1)]))], by = "chrom")
Note the closing bracket placement on the second one for the c function.
I uncovered this while debugging an error in one of the samples, I don't think this is actually related to the error I am getting. However, R emits a warning when this line is run:
In as.data.table.list(jval, .named = NULL) :
Item 2 has 23 rows but longest item has 24; recycled with remainder.
Currently, the code executes this:
> bins[, .N, by = chrom][, .(chrom, N = c(0, cumsum(N))[1:(.N - 1)])]
I don't think the value of 0 for chrY is what was intended in this case.
As an aside, I wonder if the sort order of the chromosomes matters in this case?
Steps To Reproduce
As far as I can tell, this will happen running the plot-sv-calls_dev.R script on any sample that has more than 1 chromosome.
FYI, I am using I am using Mosaic Catcher 2.0.1.
Mosaicatcher-pipeline Version
1.5.1 (Default)
Command used
The error occurred when manually executing the `plot-sv-calls_dev.R`, but was noticed when running:
--jobs 500 --config data_location=/g/huber/users/smirnov/StrandSeqData/LFS041 --profile workflow/snakemake_profiles/HPC/slurm_EMBL/ --singularity-args "-B /g:/g -B /scratch:/scratch" -c4
How did you run the pipeline?
Conda + Singularity
What did you use to run the pipeline? (local execution, HPC, cloud)
EMBL HPC
Pipeline configuration file
version: 2.0.1
ashleys_pipeline_version: 2.0.0
#######################################
# MOSAICATCHER CONFIGURATION FILE #
#######################################
# Option to display all potential options - listed in config_metadata.yaml
list_commands: False
# To be informed of pipeline status
email: ""
# Input BAM location
data_location: "/g/huber/users/smirnov/StrandSeqData/LFS041"
# Reference assembly selected
reference: "hg38"
# Enable / Disable multistep normalisation analysis
multistep_normalisation: False
# ArbiGent (Arbitrary-segments genotyping) mode of execution
arbigent: False
# Arbigent default BED file, can be changed and adapted based on user question
arbigent_bed_file: "workflow/data/arbigent/manual_segmentation.bed"
# Enable / Disable FastQC analysis
FastQC_analysis: False
# Plate orientation for GC analysis
plate_orientation: landscape
# Normalize or not mosaic counts
hgsvc_based_normalized_counts: True
# Mutually exclusive with ashleys_pipeline
input_bam_legacy: False
# Enable/Disable ashleys-qc-pipeline module loading to start pipeline from FASTQ files
ashleys_pipeline: True
# Enable / Disable comparison for each BAM file between folder name & SM tag
check_sm_tag: True
# Split / Do not split QC counts plot into single individual images (limit jobs)
split_qc_plot: True
# Chromosomes list to process
chromosomes:
- chr1
- chr2
- chr3
- chr4
- chr5
- chr6
- chr7
- chr8
- chr9
- chr10
- chr11
- chr12
- chr13
- chr14
- chr15
- chr16
- chr17
- chr18
- chr19
- chr20
- chr21
- chr22
- chrX
- chrY
chromosomes_to_exclude: []
# GENECORE
genecore: False
samples_to_process: []
genecore_date_folder: ""
genecore_prefix: "/g/korbel/shared/genecore"
## Mosaic bin window size
window: 200000
##############################
# Advanced configuration. #
##############################
alfred_plots:
- "dist"
- "devi"
## Reference assembly data
references_data:
"hg38":
reference_fasta: "workflow/data/ref_genomes/hg38.fa"
R_reference: "BSgenome.Hsapiens.UCSC.hg38"
segdups: "workflow/data/segdups/segDups_hg38_UCSCtrack.bed.gz"
snv_sites_to_genotype: ""
reference_file_location: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/hg38.analysisSet.fa.gz
# snv_sites_to_genotype: "/g/korbel2/weber/MosaiCatcher_files/snv_sites_to_genotype/ALL.chr1-22plusXY_GRCh38_sites.20170504.renamedCHR.vcf.gz"
"hg19":
reference_fasta: "workflow/data/ref_genomes/hg19.fa"
R_reference: "BSgenome.Hsapiens.UCSC.hg19"
segdups: "workflow/data/segdups/segDups_hg19_UCSCtrack.bed.gz"
snv_sites_to_genotype: ""
reference_file_location: https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/analysisSet/hg19.p13.plusMT.no_alt_analysis_set.fa.gz
# snv_sites_to_genotype: "/g/korbel2/weber/MosaiCatcher_files/snv_sites_to_genotype/ALL.chr1-22plusXY_hg19_sites.20170504.renamedCHR.vcf.gz"
"T2T":
reference_fasta: "workflow/data/ref_genomes/T2T.fa"
R_reference: "BSgenome.T2T.CHM13.V2"
# TO CHANGE
R_reference_tarball: "/g/korbel2/weber/MosaiCatcher_files/EXTERNAL_DATA/R_reference//BSgenome.T2T.CHM13.V2_1.0.0.tar.gz"
segdups: "workflow/data/segdups/segDups_T2T_UCSCtrack.bed.gz"
snv_sites_to_genotype: ""
reference_file_location: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz
# snv_sites_to_genotype: "/g/korbel2/weber/MosaiCatcher_files/snv_sites_to_genotype/ALL.chr1-22plusXY_T2T_sites.20170504.renamedCHR.vcf.gz"
## Methods dictionnary
methods:
lenient:
method_name: "simpleCalls_llr4_poppriorsTRUE_haplotagsTRUE_gtcutoff0_regfactor6_filterFALSE"
llr: 4
poppriors: TRUE
haplotags: TRUE
gtcutoff: 0
regfactor: 6
filter: "FALSE"
stringent:
method_name: "simpleCalls_llr4_poppriorsTRUE_haplotagsFALSE_gtcutoff0.05_regfactor6_filterTRUE"
llr: 4
poppriors: TRUE
haplotags: TRUE
gtcutoff: 0.05
regfactor: 6
filter: "TRUE"
plottype_counts:
- "classic"
- "GC_corrected"
plottype_consistency:
- "byaf"
- "bypos"
plottype_clustering:
- "position"
- "chromosome"
## Breakpoint density
# joint segmentation
min_diff_jointseg: 0.1
# single segmentation
min_diff_singleseg: 0.5
# SCE cutoff
additional_sce_cutoff: 20000000
# SCE min distance
sce_min_distance: 500000
# ashleys-qc pipeline arguments
mosaicatcher_pipeline: True
hand_selection: False
use_light_data: False
ashleys_threshold: 0.5
# Others
abs_path: "/"
# CURRENTLY DISABLED
### Modes ["count", "segmentation", "mosaiclassifier"] [CURRENTLY DISABLED]
# mode: "mosaiclassifier"
### Plot enabled [True] or disabled [False]
# plot: False [CURRENTLY DISABLED]
arbigent_data:
arbigent_mapability_track: workflow/data/arbigent/mapping_counts_allchrs_hg38.txt
arbigent_mapability_track_h5: workflow/data/arbigent/mapping_counts_allchrs_hg38.h5
# If specified, will copy important data (stats, plots, counts file) to a second place
publishdir: ""
scNOVA: False
scNOVA_scripts:
generate_CN_for_CNN: "workflow/scripts/scNOVA_scripts/generate_CN_for_CNN.R"
generate_CN_for_chromVAR: "workflow/scripts/scNOVA_scripts/generate_CN_for_chromVAR.R"
count_sort_annotate_geneid: "workflow/scripts/scNOVA_scripts/count_sort_annotate_geneid.R"
count_sort_label: "workflow/scripts/scNOVA_scripts/count_sort_label.R"
count_norm: "workflow/scripts/scNOVA_scripts/count_norm.R"
feature_sc_var: "workflow/scripts/scNOVA_scripts/feature_sc_var.R"
combine_features: "workflow/scripts/scNOVA_scripts/combine_features.R"
annot_expressed: "workflow/scripts/scNOVA_scripts/annot_expressed.R"
infer_diff_gene_expression: "workflow/scripts/scNOVA_scripts/infer_diff_gene_expression.R"
count_sort_annotate_chrid_CREs: "workflow/scripts/scNOVA_scripts/count_sort_annotate_chrid_CREs.R"
infer_diff_gene_expression_alt: "workflow/scripts/scNOVA_scripts/infer_diff_gene_expression_alt.R"
# Multi-step normalisation advanced parameters
multistep_normalisation_options:
min_reads_bin: 5
n_subsample: 1000
min_reads_cell: 100000
Relevant log output
Error in `[.data.table`(seg, , `:=`(from = c(1, bps[1:(.N - 1)] + 1), :
Supplied 2 items to be assigned to group 15 of size 1 in column 'from'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
Calls: [ -> [.data.table
In addition: Warning message:
In as.data.table.list(jval, .named = NULL) :
Item 2 has 23 rows but longest item has 24; recycled with remainder.
Execution halted
Contact Details
petr.smirnov@embl.de
What happened?
I think this line: https://github.com/friendsofstrandseq/mosaicatcher-pipeline/blob/3422078530b95fb2b403bfa22d164f5892bedff1/workflow/scripts/plotting/plot-sv-calls_dev.R#L366
Is not doing what is intended. The code is likely meant to be one of the following:
Note the closing bracket placement on the second one for the
c
function.I uncovered this while debugging an error in one of the samples, I don't think this is actually related to the error I am getting. However, R emits a warning when this line is run:
Currently, the code executes this:
Resulting in output:
I don't think the value of 0 for chrY is what was intended in this case.
As an aside, I wonder if the sort order of the chromosomes matters in this case?
Steps To Reproduce
As far as I can tell, this will happen running the
plot-sv-calls_dev.R
script on any sample that has more than 1 chromosome.FYI, I am using I am using Mosaic Catcher 2.0.1.
Mosaicatcher-pipeline Version
1.5.1 (Default)
Command used
How did you run the pipeline?
Conda + Singularity
What did you use to run the pipeline? (local execution, HPC, cloud)
EMBL HPC
Pipeline configuration file
Relevant log output
Anything else?
No response