Closed jingydz closed 1 year ago
Thank you, I will look into this. Can you let me know what files CNVKit produced up to this point? Were there any other error messages printed to stdout? Would you be able to share the header of your BAM file with me? samtools view -H myfile.bam
. You can email it to me at jluebeck[at] eng.ucsd.edu if you wish.
Thank you, Jens
I have installed the CNVKit software successfully.
eccDNA]$ ll -htr xxx_cnvkit_output/ total 67M -rw-r--r-- 1 xxx 15M Feb 3 17:05 GRCh38_cnvkit_filtered_ref.target-tmp.bed -rw-r--r-- 1 xxx 0 Feb 3 17:05 GRCh38_cnvkit_filtered_ref.antitarget-tmp.bed -rw-r--r-- 1 xxx 23M Feb 3 17:19 6605D.marked.realigned.targetcoverage.cnn -rw-r--r-- 1 xxx 31 Feb 3 17:19 6605D.marked.realigned.antitargetcoverage.cnn -rw-r--r-- 1 xxx 29M Feb 3 17:19 6605D.marked.realigned.cnr -rw-r--r-- 1 xxx 29K Feb 3 17:20 6605D.marked.realigned.cns -rw-r--r-- 1 xxx 13K Feb 3 17:20 6605D.marked.realigned.call.cns -rw-r--r-- 1 xxx 978K Feb 3 17:20 6605D.marked.realigned.bintest.cns
eccDNA]$ cat 6594D_finish_flag.txt UNSUCCESSFUL eccDNA]$ cat 6594D_timing_log.txt
Initialization: 0.21 Alignment and bam indexing: 3874.77
I have sent it to you by email.
Thanks for your help in advance.
Thank you, after reviewing the bam header, I think the issue is that one or more of the sequences contained is possibly a duplicate (not in sequence name, but perhaps at the sequence level). My suggestion to avoid this error is to use the reference genome version in the GRCh38 data repo for alignment.
Strangely the file list of files you provided shows that the .cnr and .cns file are populated and of reasonable sizes. It is possible this was a harmless error from CNVKit. I would recommend you inspect the .cns file to ensure that it does not appear to be truncated. If it appears to be in order, then you can proceed with the pipeline. This can be done by providing the 6605D.marked.realigned.cns
file as the argument for--bed
when running PrepareAA.py. The pipeline will pick up from that stage and continue with the analysis.
Thank you and please let me know if you run into additional issues.
Thank you for your advice. But I can only get the aligned bam not the raw fastq file. The file 6605D.marked.realigned.bintest.cns
is indeed in order. I will try add the --bed
parameter to rerun this pipeline.
Hi,
python3 /xxx/software/AmpliconArchitect/src/AmpliconArchitect.py --bam /xxx/6605D.marked.realigned.recal.bam --bed 6605D_cnvkit_output/6605D.marked.realigned.cns --ref GRCh38 --out 6605D_AA_OUT
[root:INFO] Commandline: /xxx/software/AmpliconArchitect/src/AmpliconArchitect.py --bam /xxx/6605D.marked.realigned.recal.bam --bed 6605D_cnvkit_output/6605D.marked.realigned.cns --ref GRCh38 --out 6605D_AA_OUT [root:INFO] AmpliconArchitect version 1.3.r3
[root:INFO] Python version 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]
[root:INFO] #TIME 19.875 Loading libraries and reference annotations for: GRCh38 Global ref name is GRCh38 [root:INFO] Mosek version is 8.0.0.60 [root:INFO] #TIME 59.923 Initiating bam_to_breakpoint object for: /xxx/6605D.marked.realigned.recal.bam [root:ERROR] #TIME 59.940 interval_list: Unable to open interval file "6605D_cnvkit_output/6605D.marked.realigned.cns". [root:INFO] #TIME 60.004 Reusing cstats from /xxx/software/AmpliconArchitect/AA_DATA_REPO/coverage.stats [root:INFO] #TIME 60.081 Interval sets for amplicons determined: [root:INFO] #TIME 60.081 Total Runtime
I can open the cns file, but why does this AA software report it cannot open?
$ ll -h 6605D_cnvkit_output/6605D.marked.realigned.cns
-rw-r--r-- 1 xxx 29K Feb 4 13:14 6605D_cnvkit_output/6605D.marked.realigned.cns
$ head 6605D_cnvkit_output/6605D.marked.realigned.cns
chromosome start end gene log2 depth probes weight ci_lo ci_hi
chr1 14941 357863 - 0.304126 45.49 49 48.6028 0.276667 0.34897
chr1 357863 456821 - -0.560944 24.1355 20 19.8377 -0.592434 -0.497789
chr1 456821 2647734 - -0.0124667 37.5159 423 419.586 -0.0189111 -0.00570006
chr1 2647734 2697776 - 1.53253 124.195 10 9.91932 1.44669 1.65633
chr1 2697776 2761287 - 0.682905 69.257 4 3.96772 0.446556 0.888701
chr1 2761287 7840338 - 0.00889918 36.6523 1015 1006.81 0.0068516 0.0111559
chr1 7840338 11379678 - -0.0408673 34.7562 708 702.285 -0.0435213 -0.039271
chr1 11379678 12824408 - -0.00673149 35.7686 289 286.667 -0.00958998 -0.00394125
chr1 12824408 13194372 - 0.21524 41.4975 64 63.4834 0.198446 0.233692
I mean that you should provide the .cns file as the argument of --bed when running PrepareAA.py. Apologies for the confusion.
Thank you very much! I run it successfully. (I need to change the name of the bam file so that it matches the input of the -s parameter.)
Change 6605D.marked.realigned.recal.bam
to 6605D.bam
.
/Parastor300s_G30S/zhangjj/software/miniconda3/bin/python3 /parastor300/work01/zhangjj/software/AmpliconSuite-pipeline/PrepareAA.py -s 6605D -t 50 --cnvkit_dir /parastor300/work01/zhangjj/software/cnvkit/cnvkit.py --bam 6605D.bam --ref GRCh38 --downsample 10.0 -o 6605D_AA_OUT --run_AA --run_AC
All stages appear to have completed successfully. $ cat 6605D_finish_flag.txt All stages completed
Command
/xxx/software/miniconda3/bin/python3 /xxx/software/AmpliconSuite-pipeline/PrepareAA.py -s xxx -t 10 --cnvkit_dir /xxx/software/cnvkit/cnvkit.py --bam /xxx/bam/xxx.marked.realigned.recal.bam --ref hg38 --run_AA --run_AC
Error log
... Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.antitargetcoverage.cnn with 0 regions Processing target: xxx.marked.realigned Keeping 566505 of 568558 bins Correcting for GC bias... Processing antitarget: xxx.marked.realigned Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cnr with 566505 regions Segmenting /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cnr ... Segmenting with method 'cbs', significance threshold 1e-06, in 10 processes Smoothing overshot at 1 / 24256 indices: (-25.906711296876612, 1.942657497152338) vs. original (-25.13711592269252, 2.7912161890294422) Smoothing overshot at 1 / 20787 indices: (-25.39576316111421, 0.9348917234560586) vs. original (-25.13507644140011, 1.0925847171282095) Smoothing overshot at 4 / 21507 indices: (-17.577676424120234, 1.3497934166366607) vs. original (-25.132818852303526, 0.7033651703059272) Smoothing overshot at 3 / 18327 indices: (-2.3793084884925557, 1.4443874078440597) vs. original (-7.928045105727523, 1.4065439832027493) Smoothing overshot at 2 / 17667 indices: (-27.25791025355295, 1.4929632523748464) vs. original (-25.14017861326128, 1.9005423152557792) Smoothing overshot at 5 / 5172 indices: (-18.1322663495857, 1.306479736222608) vs. original (-25.142368850944987, 0.5700674808940851) Smoothing overshot at 7 / 7464 indices: (-27.334216032452893, 2.258673566282255) vs. original (-25.09869975583872, 1.8128380045187196) Smoothing overshot at 13 / 7049 indices: (-26.37829491018431, 2.224948841739139) vs. original (-25.12928600956857, 1.5845356334013807) Smoothing overshot at 94 / 1974 indices: (-25.437128481436986, 0.6129591159045726) vs. original (-24.326774164045066, 0.6369327523167614) Smoothing overshot at 352 / 3151 indices: (-25.459184691322122, 0.1813556305669362) vs. original (-24.296079071734308, 0.4363252066059615) Post-processing /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cns ... Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cns with 391 regions Applying filter 'ci' Filtered by 'ci' from 391 to 189 rows Calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3 Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.call.cns with 189 regions Significant hits in 16662/566505 bins (2.94%) Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.bintest.cns with 16662 regions Traceback (most recent call last): File "/xxx/software/cnvkit/cnvkit.py", line 9, in
args.func(args)
File "/xxx/cnvkit/cnvlib/commands.py", line 663, in _cmd_segment
cnarr = read_cna(args.filename)
File "/xxx/software/cnvkit/cnvlib/cmdutil.py", line 12, in read_cna
return tabio.read(infile, into=CNA, sample_id=sample_id, meta=meta)
File "/xxx/software/cnvkit/skgenome/tabio/init.py", line 74, in read
dframe = reader(infile, *kwargs)
File "/xxx/software/cnvkit/skgenome/tabio/tab.py", line 17, in read_tab
dframe = pd.read_csv(infile, sep='\t', dtype={'chromosome': 'str'})
File "/xxx/.local/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(args, kwargs)
File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in init
self._engine = self._make_engine(f, self.engine)
File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "/xxx/.local/lib/python3.9/site-packages/pandas/io/common.py", line 789, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.recal.cnr'
CNVKit encountered a non-zero exit status. Exiting...
2023-02-03 16:31:46.004495