Open kotliary opened 3 years ago
Hi. I can take a look at this. Does it work OK when you don't specify chromosomes?
just confirming, I'm finding some issues with the multiprocessor cli with 5.2.11. I'm trying to fix. I'll send an update when I have a patch
HI!
I have a new release here with a few fixes (https://github.com/tamsen/Pisces/releases/tag/v5.3.0.0). It should help your issue (or at the very least, expose enough logging to help). Can you please download it and give it a try? When you run GeminiMulti, it should make two folders GeminiChromosomeLogs and GeminiMultiLogs. Hopefully if the error still occurs, we can see the the issue in those logs. At that point, we can drill down to the problem specific to your bam.
Note, the new command line is as below, and does NOT require dotnet or the ".dll" extension.
/new/pisces/path/pisces_all/GeminiMulti --bam my.bam --genome /Genomes/HomoSapiens/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta --samtools samtools --exePath /new/pisces/path/pisces_all/Gemini --outFolder out --numProcesses 24 --chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY
Hope this helps, Tamsen
I have the same problem here. I am using v5.3.0.0
This works
module load dotnet/2.0.3
pisces_all/pisces_all/GeminiMulti \
-bam p5p7_FINAL_mapped.bam \
--exePath pisces_all/pisces_all/Gemini \
-genome v0_masked_new_15092023_4_RACP-PISCES \
-samtools /local/software/samtools/1.16.1/bin/samtools \
--outFolder Results \
--numProcesses 5 \
but when I add this
--chromosomes chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY
I don't get any error message (unlike kotliary) but I get an FAILED (exit code 201)
What this means for you??
These are the last lines of the log file
Time: 00:00:58.9
2/20/24 8:39 AM 4 PROCESS Gemini_chrX: ExitCode: 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr1 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr2 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr3 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr4 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr5 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr6 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr7 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr8 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr9 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr10 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr11 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr12 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr13 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr14 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr15 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr16 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr17 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr18 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr19 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr20 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr21 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chr22 with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chrX with exit code 0.
2/20/24 8:39 AM 1 Completed task Gemini_chrY with exit code 0.
2/20/24 8:39 AM 1 Completed 24 tasks.
2/20/24 8:39 AM 1 Calling samtools cat on 24 files to create p5p7_FINAL_mapped.PairRealigned.bam.
2/20/24 8:39 AM 1 Calling final samtools cat on 24 bams with output at p5p7_FINAL_mapped.PairRealigned.bam.
2/20/24 8:39 AM 1 Calling samtools index on p5p7_FINAL_mapped.PairRealigned.bam.
2/20/24 8:39 AM 1 Done finalizing bam.
2/20/24 8:39 AM 1 Consolidating log files.
2/20/24 8:39 AM 1 Deleting intermediate files.
2/20/24 8:39 AM 1 Done cleaning up.
2/20/24 8:39 AM 1 ******************** Ending *********************
Time: 00:02:31.0
---------------------------------------------------------------------------
Pisces Software GNU GENERAL PUBLIC LICENSE
https://github.com/tamsen/Pisces 5.3.0.0
---------------------------------------------------------------------------
Please reference 'Tamsen Dunn, Gwenn Berry, Dorothea Emig-Agius, Yu Jiang, Serena Lei, Anita Iyer, Nitin Udar, Han-Yu Chuang, Jeff Hegarty, Michael Dickover, Brandy Klotzle, Justin Robbins, Marina Bibikova, Marc Peeters, Michael Strömberg, Pisces: an accurate and versatile variant caller for somatic and germline next-generation sequencing data, Bioinformatics, Volume 35, Issue 9, 1 May 2019, Pages 1579–1581, https://doi.org/10.1093/bioinformatics/bty849'
---------------------------------------------------------------------------
USAGE: dotnet GeminiMulti.dll --bam <bam path> --genome <genome path> --samtools <samtools path> --outFolder <output path> --numProcesses 20 --exePath <path to gemini subprocess>
GeminiMulti: pair-aware indel realigner and read stitcher
REQUIRED:
--bam <PATH> PATH to the original bam file. (Required).
--genome <PATH> PATH to the genome directory. (Required).
--samtools <PATH> PATH to the samtools executable. (Required).
--numprocesses <INT> INT indicating the number of Gemini subprocesses
to run. (Required).
--exepath <PATH> PATH to the executable file for the Gemini
subprocess. (Required).
--outfolder <PATH> PATH of directory in which to create the new bam
file. (Required).
COMMON:
--samtoolsoldstyle <BOOL>
BOOL Whether the provided samtools executable is
the old version that uses an output prefix
rather than an explicit '-o' output option
(http://www.htslib.org/doc/samtools-1.1.htm).
Default: false.
--keepbothsidesoftclips <BOOL>
BOOL Whether to trust that both-side softclips
are probe and should stay softclipped. Default:
false.
--trustsoftclips <BOOL>
BOOL Whether to trust softclips. If true, having
softclips doesn't automatically trigger indel
realignment. Also, won't try to stitch the
softclips. Default: false.
--keepprobe <BOOL> BOOL Whether to trust that probe-side softclips
are probe and should stay softclipped. Default:
false.
--remaskmessysoftclips <BOOL>
BOOL If true, read-ends that were originally
softclipped and are still highly mismatching to
reference after realignment are re-softclipped,
even if not configured to keep probe softclips.
If false, only N-softclips are remasked when not
keeping probe softclips. Default value is false.
--stitchonly <BOOL> BOOL Whether to only perform stitching, skipping
realignment.
--realignonly <BOOL> BOOL Whether to only perform realignment,
skipping stitching.
--help, -h displays the help menu
--version, -v displays the version
GEMINI_MULTI:
--multiprocess <BOOL> BOOLWhether to use multi-process, as opposed to
multi-thread, processing for each chromosome.
Default: true.
--chromosomes <LIST> LISTComma-separated list of chromosomes to
process, if only processing particular
chromosomes. Default: empty (all chromosomes
will be processed).
STITCHING:
--minbasecallquality <INT>
INT Cutoff for which, in case of a stitching
conflict, bases with qscore less than this value
will automatically be disregarded in favor of
the mate's bases.
--nifydisagreement <BOOL>
BOOL Whether or not to turn high-quality
disagreeing overlap bases to Ns. Default: false.
--maxreadlength <INT> INT Maximum expected length of individual reads,
used to determine the maximum expected stitched
read length (2*len - 1). For optimal performanc-
e, set as low as appropriate (i.e. the actual
single-read length + max deletion length you
expect to stitch) for your data. Default: 1024.
--dontstitchrepeatoverlap <BOOL>
BOOL Whether to not stitch read pairs whose only
overlap is a repeating sequence. Default: true.
--ignorereadsabovemaxlength <BOOL>
BOOL Whether to passively ignore read pairs that
would be above the max stitched length (e.g.
extremely long deletions). Default: false.
--countnstowarddisagreeingbases <BOOL>
BOOL Whether to count overlapping-base
disagreements where one of the mates reports an
'N' as a full-force disagreement (ie Nify the
base if configured to do so, and count toward
the number of disagreements in determining
whether the stitching result should be rejected-
). Default: false.
--maxnumdisagreeingstitchedbases <INT>
INT Maximum number of stitched bases that can
disagree between the two reads before a stitched
read is rejected. Default: int.MaxValue
--stringtagstokeepfromr1 <LIST>
LIST Comma-delimited list of string tags to
retain from read 1 when stitching. Default: none.
READ_FILTERING:
--skipandremovedups <BOOL>
BOOL Whether to skip and remove duplicates.
Default: True.
--minmapquality <INT> INT Reads pairs with map quality less than this
value should be filtered. If only one mate in a
pair has a low map quality, it is treated as
Split (or derivations thereof). Should not be
negative. Default: 1.
--filterforproperpairs <BOOL>
BOOL Whether reads marked as not proper pairs
shall be filtered. Default: false.
--treatabnormalorientationasimproper <BOOL>
BOOL Whether to treat non-F1R2/F2R1 read pairs
as improper even if flagged as properly paired.
Default: False.
REALIGNMENT:
--maskpartialinsertion <BOOL>
BOOL Option to softclip a partial insertion at
the end of a realigned read (a complete but un-
anchored insertion is allowed). Default: false.
--minimumunanchoredinsertionlength <INT>
INT Minimum length of an unanchored insertion (-
i.e. no flanking reference base on one side)
allowed in a realigned read. Insertions shorter
than the specified length will be softclipped.
Default value is 0, i.e. allowing unanchored
insertions of any length.
--softclipunknownindels <BOOL>
BOOL Whether to softclip out unknown indels.
Default: false.
--checksoftclipsformismatches <BOOL>
BOOL Whether to count mismatches in softclips
toward total mismatches. Default: false.
--trackmismatches <BOOL>
BOOL Whether to track and compare mismatches
when realigning. Default: false.
--categoriestorealign <LIST>
LIST Category names that should be attempted to
realign. Default: ImperfectStitched,FailStitc-
h,UnstitchIndel,Unstitchable,Disagre-
e,MessyStitched,MessySplit,UnstitchImperfec-
t,LongFragment,UnstitchMess-
y,UnstitchForwardMessy,UnstitchReverseMess-
y,UnstitchForwardMessyInde-
l,UnstitchReverseMessyInde-
l,UnstitchMessySuspiciousRea-
d,UnstitchMessyIndelSuspiciousRea-
d,UnstitchMessySuspiciousMd
--categoriestosnowball <LIST>
LIST Category names that should be attempted to
snowball. Default: none.
--pairawareeverything <BOOL>
BOOL Whether to pass everything through pair
aware realignment, or just the expected
categories (Disagree, FailStitch, UnstitchIndel-
). Default: false.
--forcehighlikelihoodrealigners <BOOL>
BOOL Whether to force realignment in high-
likelihood categories even if the neighborhood
would not have been eligible for realignment.
Default: false.
INDEL_FILTERING:
--minpreferredsupport <INT>
INT Instances of a found variant before it can
be considered to realign around. Default: 3.
--minpreferredanchor <INT>
INT Minimum anchor around indel to count an
observation toward good evidence. Default: 1.
--minrequiredindelsupport <INT>
INT Don't even allow otherwise strong indels
that we attempt to rescue in if they have num
observations below this. Default: 0.
--minrequiredanchor <INT>
INT Don't even allow otherwise strong indels
that we attempt to rescue in if they have min
anchor below this. Default: 0.
--maxmessthreshold <INT>
INT Don't allow indels with average mess above
this value. Default: 20.
--binsize <INT> INT Size of bin within which to consider indels
overlapping and eligible for pruning. Default: 0
(do not clean up).
--requirepositiveoutcomeforsnowball <BOOL>
BOOL Whether to filter out indels that did not
have any realignment attempts at all during
snowballing (stricter than base level of
filtering indels that had failed realignment
attempts). Default: True.
REALIGNMENT_BINS:
--messysitethreshold <INT>
INT Minimum (raw) number of messy-type reads
that must be present in a neighborhood for it to
be considered messy and a potential realignable
neighborhood. Must also meet the frequency
thresholds. Default: 1.
--messysitewidth <INT> INT Neighborhood width to use when binning
realignment eligibility signals. Default: 500.
--collectdepth <BOOL> BOOL When collecting realignment eligibility
signals, whether to collect depth to gauge
frequency information. Default: True.
--imperfectfreqthreshold <FLOAT>
FLOAT Proportion of imperfect reads in bin below
which we should not bother to realign. Should be
proportional to detection limit and bin width.
Default: 0.03.
--indelregionfreqthreshold <FLOAT>
FLOAT Proportion of imperfect reads in bin below
which we should not bother to realign. Should be
proportional to detection limit and bin width.
Default: 0.01.
--regiondepththreshold <INT>
INT When collecting realignment eligibility
signals and depth, minimum total number of reads
in a neighborhood below which the neighborhood
would be ineligible for realignment. Default: 5.
--recalculateusablesitesaftersnowball <BOOL>
BOOL Whether to recalculate site usability after
snowballing. Default: True.
PROCESSING:
--readcachesize <INT> INT Batch size. Default: 1000.
--regionsize <INT> INT Size of genomic region to process at one
time. Appropriate setting depends upon read
depth, density and available memory. Default:
10000000.
--numconcurrentregions <INT>
INT Number of concurrent regions to hold in
memory/process at once. Default: 1.
--maxnumthreads <INT> INT Maximum number of threads per process.
Default: 1.
READ_SILENCING:
--directionalmessthreshold <FLOAT>
FLOAT Proportion of directionally messy
(ForwardMessy or ReverseMessy, etc) reads in
neighborhood above which we should silence the
affected mates. Default: 0.2.
--messymapq <INT> INT Mapping quality of reads below which, when
combined with high mismatch/softclips, a read is
considered a suspicious/multi-mapping messy rea-
d. Default: 30.
--silencesuspiciousmdreads <BOOL>
BOOL Whether to silence read pairs whose MD tags
indicate suspicion. Default: False.
--silencedirectionalmessreads <BOOL>
BOOL Whether to silence read mates which are
very messy and have clean mates, given that the
proportion of such reads in the neighborhood
exceeds DirectionalMessThreshold. Default: False.
--silencemessymapmessreads <BOOL>
BOOL Whether to silence read pairs that are
messy and have one or both mates with mapping
quality below MessyMapq, given that the
proportion of such reads in the neighborhood
exceeds DirectionalMessThreshold. Default: False.
DEBUG:
--logregionsandrealignments <BOOL>
BOOL Debug option to write region stats to the
log. Default: False.
--lightdebug <BOOL> BOOL Whether to log minimal debug logging.
Default: false.
--debug <BOOL> BOOL Whether we should run in debug (verbose)
mode. Default: false.
--keepunmerged <BOOL> BOOL Whether to keep unmerged bams, for
debugging. Default: false.
READ_CLASSIFICATION:
--numsoftclipstobeconsideredmessy <INT>
INT When classifying reads (eg imperfect, messy,
directional messy), the min number of softclips
that will trigger one of the messy
classifications, given that softclips are not to
be trusted. Default: 8.
--nummismatchestobeconsideredmessy <INT>
INT When classifying reads (eg imperfect, messy,
directional messy), the min number of mismatches
that will trigger one of the messy
classifications. Default: 3.
5.3.0.0
Some problems were encountered when parsing the command line options:
For a complete list of command line options, type "GeminiMulti -h"
==============================================================================
Running epilogue script on gold51.
Submit time : 2024-02-20T08:36:35
Start time : 2024-02-20T08:36:49
End time : 2024-02-20T08:39:20
Elapsed time : 00:02:31 (Timelimit=20:00:00)
Job ID: 5469291
Cluster: i5
User/Group: mdb1c20/af
State: FAILED (exit code 201)
Cores: 1
CPU Utilized: 00:02:30
CPU Efficiency: 99.34% of 00:02:31 core-walltime
Job Wall-clock time: 00:02:31
Memory Utilized: 2.20 GB
Memory Efficiency: 6.67% of 33.00 GB
The output looks good to me. I mean, the new BAM is created ( I have not checked in details)
GeminiChromosomeLogs GeminiMultiLogs p5p7_FINAL_mapped.PairRealigned.bam p5p7_FINAL_mapped.PairRealigned.bam.bai
but I would like to know where the list of the indels is. I cannot find this in GeminiChromosomeLogs or GeminiMultiLogs.
Regards,
Gemini (5.2.11 release) does not complete running WES BAM file selecting multiple chromosomes with
--chromosomes
argument. Every time I run Gemini I get different chromosomes with an error, some time the error code is 1, sometime 134 or 137 or 139. What those codes mean?If I increase the memory requirement (I run it on a cluster nodes) from 20GB to 64GB I get less chromosomes with error, about half for 20GB, and just 2 for 64GB. With 128GB I get also 2 chromosomes as with 64GB. And the chromosomes with an error are always different, and it seems it doesn't depend on order they are processed.
BTW, when I run without specifying memory requirement all chromosome gave error code 137.
My command line:
The error:
There is no error in logs for individual chromosomes.
The minimum set of chromosomes I could successfully run is
chr20,chr21,chr22,chrX,chrY
. If I add one more chromosome it fails.I have also tried he previous version, but had the same problem.
If you need an example BAM file, I can provide, but the files are big, over 20GB.