AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
58 stars 28 forks source link

CNVKit encountered a non-zero exit status. Exiting... #32

Closed jingydz closed 1 year ago

jingydz commented 1 year ago

Command

/xxx/software/miniconda3/bin/python3 /xxx/software/AmpliconSuite-pipeline/PrepareAA.py -s xxx -t 10 --cnvkit_dir /xxx/software/cnvkit/cnvkit.py --bam /xxx/bam/xxx.marked.realigned.recal.bam --ref hg38 --run_AA --run_AC

Error log

... Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.antitargetcoverage.cnn with 0 regions Processing target: xxx.marked.realigned Keeping 566505 of 568558 bins Correcting for GC bias... Processing antitarget: xxx.marked.realigned Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cnr with 566505 regions Segmenting /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cnr ... Segmenting with method 'cbs', significance threshold 1e-06, in 10 processes Smoothing overshot at 1 / 24256 indices: (-25.906711296876612, 1.942657497152338) vs. original (-25.13711592269252, 2.7912161890294422) Smoothing overshot at 1 / 20787 indices: (-25.39576316111421, 0.9348917234560586) vs. original (-25.13507644140011, 1.0925847171282095) Smoothing overshot at 4 / 21507 indices: (-17.577676424120234, 1.3497934166366607) vs. original (-25.132818852303526, 0.7033651703059272) Smoothing overshot at 3 / 18327 indices: (-2.3793084884925557, 1.4443874078440597) vs. original (-7.928045105727523, 1.4065439832027493) Smoothing overshot at 2 / 17667 indices: (-27.25791025355295, 1.4929632523748464) vs. original (-25.14017861326128, 1.9005423152557792) Smoothing overshot at 5 / 5172 indices: (-18.1322663495857, 1.306479736222608) vs. original (-25.142368850944987, 0.5700674808940851) Smoothing overshot at 7 / 7464 indices: (-27.334216032452893, 2.258673566282255) vs. original (-25.09869975583872, 1.8128380045187196) Smoothing overshot at 13 / 7049 indices: (-26.37829491018431, 2.224948841739139) vs. original (-25.12928600956857, 1.5845356334013807) Smoothing overshot at 94 / 1974 indices: (-25.437128481436986, 0.6129591159045726) vs. original (-24.326774164045066, 0.6369327523167614) Smoothing overshot at 352 / 3151 indices: (-25.459184691322122, 0.1813556305669362) vs. original (-24.296079071734308, 0.4363252066059615) Post-processing /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cns ... Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.cns with 391 regions Applying filter 'ci' Filtered by 'ci' from 391 to 189 rows Calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3 Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.call.cns with 189 regions Significant hits in 16662/566505 bins (2.94%) Wrote /xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.bintest.cns with 16662 regions Traceback (most recent call last): File "/xxx/software/cnvkit/cnvkit.py", line 9, in args.func(args) File "/xxx/cnvkit/cnvlib/commands.py", line 663, in _cmd_segment cnarr = read_cna(args.filename) File "/xxx/software/cnvkit/cnvlib/cmdutil.py", line 12, in read_cna return tabio.read(infile, into=CNA, sample_id=sample_id, meta=meta) File "/xxx/software/cnvkit/skgenome/tabio/init.py", line 74, in read dframe = reader(infile, *kwargs) File "/xxx/software/cnvkit/skgenome/tabio/tab.py", line 17, in read_tab dframe = pd.read_csv(infile, sep='\t', dtype={'chromosome': 'str'}) File "/xxx/.local/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(args, kwargs) File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv return _read(filepath_or_buffer, kwds) File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in init self._engine = self._make_engine(f, self.engine) File "/xxx/.local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine self.handles = get_handle( # type: ignore[call-overload] File "/xxx/.local/lib/python3.9/site-packages/pandas/io/common.py", line 789, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: '/xxx/eccDNA/xxx_cnvkit_output/xxx.marked.realigned.recal.cnr' CNVKit encountered a non-zero exit status. Exiting... 2023-02-03 16:31:46.004495

jluebeck commented 1 year ago

Thank you, I will look into this. Can you let me know what files CNVKit produced up to this point? Were there any other error messages printed to stdout? Would you be able to share the header of your BAM file with me? samtools view -H myfile.bam. You can email it to me at jluebeck[at] eng.ucsd.edu if you wish.

Thank you, Jens

jingydz commented 1 year ago

I have installed the CNVKit software successfully.

CNVKit output as follows:

eccDNA]$ ll -htr xxx_cnvkit_output/ total 67M -rw-r--r-- 1 xxx 15M Feb 3 17:05 GRCh38_cnvkit_filtered_ref.target-tmp.bed -rw-r--r-- 1 xxx 0 Feb 3 17:05 GRCh38_cnvkit_filtered_ref.antitarget-tmp.bed -rw-r--r-- 1 xxx 23M Feb 3 17:19 6605D.marked.realigned.targetcoverage.cnn -rw-r--r-- 1 xxx 31 Feb 3 17:19 6605D.marked.realigned.antitargetcoverage.cnn -rw-r--r-- 1 xxx 29M Feb 3 17:19 6605D.marked.realigned.cnr -rw-r--r-- 1 xxx 29K Feb 3 17:20 6605D.marked.realigned.cns -rw-r--r-- 1 xxx 13K Feb 3 17:20 6605D.marked.realigned.call.cns -rw-r--r-- 1 xxx 978K Feb 3 17:20 6605D.marked.realigned.bintest.cns

The other files I got:

eccDNA]$ cat 6594D_finish_flag.txt UNSUCCESSFUL eccDNA]$ cat 6594D_timing_log.txt

stage: walltime(seconds)

Initialization: 0.21 Alignment and bam indexing: 3874.77

The header of myfile.bam

I have sent it to you by email.

Thanks for your help in advance.

jluebeck commented 1 year ago

Thank you, after reviewing the bam header, I think the issue is that one or more of the sequences contained is possibly a duplicate (not in sequence name, but perhaps at the sequence level). My suggestion to avoid this error is to use the reference genome version in the GRCh38 data repo for alignment.

Strangely the file list of files you provided shows that the .cnr and .cns file are populated and of reasonable sizes. It is possible this was a harmless error from CNVKit. I would recommend you inspect the .cns file to ensure that it does not appear to be truncated. If it appears to be in order, then you can proceed with the pipeline. This can be done by providing the 6605D.marked.realigned.cns file as the argument for--bed when running PrepareAA.py. The pipeline will pick up from that stage and continue with the analysis.

Thank you and please let me know if you run into additional issues.

jingydz commented 1 year ago

Thank you for your advice. But I can only get the aligned bam not the raw fastq file. The file 6605D.marked.realigned.bintest.cns is indeed in order. I will try add the --bedparameter to rerun this pipeline.

jingydz commented 1 year ago

Hi,

Command

python3 /xxx/software/AmpliconArchitect/src/AmpliconArchitect.py --bam /xxx/6605D.marked.realigned.recal.bam --bed 6605D_cnvkit_output/6605D.marked.realigned.cns --ref GRCh38 --out 6605D_AA_OUT

Error log

[root:INFO] Commandline: /xxx/software/AmpliconArchitect/src/AmpliconArchitect.py --bam /xxx/6605D.marked.realigned.recal.bam --bed 6605D_cnvkit_output/6605D.marked.realigned.cns --ref GRCh38 --out 6605D_AA_OUT [root:INFO] AmpliconArchitect version 1.3.r3

[root:INFO] Python version 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]

[root:INFO] #TIME 19.875 Loading libraries and reference annotations for: GRCh38 Global ref name is GRCh38 [root:INFO] Mosek version is 8.0.0.60 [root:INFO] #TIME 59.923 Initiating bam_to_breakpoint object for: /xxx/6605D.marked.realigned.recal.bam [root:ERROR] #TIME 59.940 interval_list: Unable to open interval file "6605D_cnvkit_output/6605D.marked.realigned.cns". [root:INFO] #TIME 60.004 Reusing cstats from /xxx/software/AmpliconArchitect/AA_DATA_REPO/coverage.stats [root:INFO] #TIME 60.081 Interval sets for amplicons determined: [root:INFO] #TIME 60.081 Total Runtime

Question

I can open the cns file, but why does this AA software report it cannot open? $ ll -h 6605D_cnvkit_output/6605D.marked.realigned.cns -rw-r--r-- 1 xxx 29K Feb 4 13:14 6605D_cnvkit_output/6605D.marked.realigned.cns $ head 6605D_cnvkit_output/6605D.marked.realigned.cns chromosome start end gene log2 depth probes weight ci_lo ci_hi chr1 14941 357863 - 0.304126 45.49 49 48.6028 0.276667 0.34897 chr1 357863 456821 - -0.560944 24.1355 20 19.8377 -0.592434 -0.497789 chr1 456821 2647734 - -0.0124667 37.5159 423 419.586 -0.0189111 -0.00570006 chr1 2647734 2697776 - 1.53253 124.195 10 9.91932 1.44669 1.65633 chr1 2697776 2761287 - 0.682905 69.257 4 3.96772 0.446556 0.888701 chr1 2761287 7840338 - 0.00889918 36.6523 1015 1006.81 0.0068516 0.0111559 chr1 7840338 11379678 - -0.0408673 34.7562 708 702.285 -0.0435213 -0.039271 chr1 11379678 12824408 - -0.00673149 35.7686 289 286.667 -0.00958998 -0.00394125 chr1 12824408 13194372 - 0.21524 41.4975 64 63.4834 0.198446 0.233692

jluebeck commented 1 year ago

I mean that you should provide the .cns file as the argument of --bed when running PrepareAA.py. Apologies for the confusion.

jingydz commented 1 year ago

Thank you very much! I run it successfully. (I need to change the name of the bam file so that it matches the input of the -s parameter.) Change 6605D.marked.realigned.recal.bam to 6605D.bam.

Command

/Parastor300s_G30S/zhangjj/software/miniconda3/bin/python3 /parastor300/work01/zhangjj/software/AmpliconSuite-pipeline/PrepareAA.py -s 6605D -t 50 --cnvkit_dir /parastor300/work01/zhangjj/software/cnvkit/cnvkit.py --bam 6605D.bam --ref GRCh38 --downsample 10.0 -o 6605D_AA_OUT --run_AA --run_AC

All stages appear to have completed successfully. $ cat 6605D_finish_flag.txt All stages completed