AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
48 stars 25 forks source link

how the pipeline generate CNV_CALLS_pre_filtered.bed file? #55

Closed JFanbio closed 1 month ago

JFanbio commented 2 months ago

Hi there, I used the end-to-end wrapper to process my data, but I didn't get any ecDNA. So, I want to check the steps one by one. I reran the data following the log file generated by AmpliconSuite-pipeline. When came to amplified_intervals, I noticed CALLS_pre_filtered.bed in the command line but didn't generated by previous step. Could you please tell me how this file generated?

[root:INFO] Launched on 2024-04-13 22:31:03.405165 [root:INFO] AmpiconSuite-pipeline version 1.2.1

[root:INFO] /home/jianfan/Software/miniforge3/envs/ampsuite/bin/AmpliconSuite-pipeline.py -s C8_merged -t 8 --bam C8_merged.bam --ref hg19 --run_AA --run_AC -o /home/jianfan/sd1/sx/scWGS/AS/C8_merged

[root:INFO] Matched C8_merged.bam to reference genome hg19 [root:INFO] hg19 data repo constructed on Thu Jul 27 13:16:22 PDT 2023

[root:INFO] Running AmpliconSuite-pipeline on sample: C8_merged [root:INFO] C8_merged.bam index not found, calling samtools index [root:INFO] Finished indexing [root:INFO]
C8_merged.bam: 329005012 + 0 properly paired (67.18% : N/A) [root:WARNING] WARNING: BAM FILE PROPERLY PAIRED RATE IS BELOW 95%. Quality of data may be insufficient for AA analysis. Poorly controlled insert size distribution during sample prep can cause high fractions of read pairs to be marked as discordant during alignment. Artifactual short SVs and long runtimes may occur!

[root:INFO]
Running CNVKit batch [root:INFO] python3 /home/jianfan/Software/miniforge3/envs/ampsuite/bin/cnvkit.py batch -m wgs -r /home/jianfan/data_repo/hg19/hg19_cnvkit_filtered_ref.cnn -p 8 -d /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output/ C8_merged.bam [root:INFO]
Running CNVKit segment [root:INFO] python3 /home/jianfan/Software/miniforge3/envs/ampsuite/bin/cnvkit.py segment /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output/C8_merged.cnr -p 8 -m cbs -o /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output/C8_merged.cns [root:INFO]
Cleaning up temporary CNVkit files [root:INFO] rm -f /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output//tmp.bed /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output//.cnn /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output//target.bed /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output//.bintest.cns [root:INFO] gzip -f /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output/C8_merged.cnr [root:INFO]
Running amplified_intervals [root:INFO] python /home/jianfan/Software/miniforge3/envs/ampsuite/lib/python3.10/site-packages/ampliconarchitectlib/amplified_intervals.py --ref hg19 --bed /home/jianfan/sd1/sx/scWGS/AS/C8_merged//C8_merged_CNV_CALLS_pre_filtered.bed --bam C8_merged.bam --gain 4.5 --cnsize_min 50000 --out /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_AA_CNV_SEEDS [root:INFO] Properly paired rate less than 90%, setting --insert_sdevs 9.0 for AA [root:INFO] python /home/jianfan/Software/miniforge3/envs/ampsuite/lib/python3.10/site-packages/ampliconarchitectlib/AmpliconArchitect.py --ref hg19 --downsample 10 --bed /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_AA_CNV_SEEDS.bed --bam C8_merged.bam --runmode FULL --extendmode EXPLORE --out /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_AA_results//C8_merged --insert_sdevs 9.0 [root:INFO]
Running AC [root:INFO] /home/jianfan/Software/miniforge3/envs/ampsuite/bin/make_input.sh /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_AA_results/ /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_classification/C8_merged [root:INFO] python3 /home/jianfan/Software/miniforge3/envs/ampsuite/bin/amplicon_classifier.py -i /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_classification/C8_merged.input --ref hg19 -o /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_classification/C8_merged --report_complexity [root:INFO] python3 /home/jianfan/Software/miniforge3/envs/ampsuite/bin/make_results_table.py -i /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_classification/C8_merged.input --classification_file /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_classification/C8_merged_amplicon_classification_profiles.tsv --summary_map /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_classification/C8_merged_summary_map.txt --cnv_bed /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_cnvkit_output/C8_merged_CNV_CALLS.bed --run_metadata_file /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_run_metadata.json --sample_metadata_file /home/jianfan/sd1/sx/scWGS/AS/C8_merged/C8_merged_sample_metadata.json [root:INFO]
All stages appear to have completed successfully.

jluebeck commented 2 months ago

Hi,

Thanks for reaching out with this query. If the sample does not have ecDNA, then a result that does not contain any ecDNA is expected. I am happy to take a look at your output files if you would like more feedback.

I noticed from the file name and the extremely low properly paired rate in your sample (67%) - are you trying single-cell WGS? AmpliconArchitect has not been tested with single-cell WGS before, and I do not know if it will perform well.

Thanks, Jens

JFanbio commented 2 months ago

Hi Jens

Right! I am processing single-cell WGS data, but I merge the bam files from more than 1k cells, with an average coverage of 0.1X. So I think it may be acceptable. What kind of output files do you want to check and how I share them with you? Another question is my colleague and I process one bulk WGS data at the sample time but she gets more ecDNA than me. She thinks the new version I used may have a critical criteria than the old version she used. By the way, she used AmpliconArchitect and AmpliconClassifier step by step. Thank you!

Best, Jian

jluebeck commented 2 months ago

Thanks Jian, do you have a directory containing the AA output files? C8_merged_AA_results/. That would be helpful to see. It's hard to know how will this will work even on pseudobulk scWGS.

Regarding your second point - It's possible that if the samples were processed with different versions of the tools then the results will differ. Please compare tool versions, and outputs amongst yourselves and let me know if you have additional questions.

If you want you can email output files to me at jluebeck [a] ucsd.edu. Thanks! Jens

JFanbio commented 2 months ago

Thank you so much! I have sent you an email.

jluebeck commented 2 months ago

Thanks you for sending. I don't see any errors. One question is which aligner was used and what parameters were used for alignment. Can you provide the header of the bam file?

Thank you, Jens

JFanbio commented 2 months ago

Hi Jens, Here is the header, but not the total, because every single cell shares the same pipeline.

@HD VN:1.0 SO:coordinate @SQ SN:chr1 LN:249250621 @SQ SN:chr2 LN:243199373 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @SQ SN:chrM LN:16571 @SQ SN:chr1_gl000191_random LN:106433 @SQ SN:chr1_gl000192_random LN:547496 @SQ SN:chr4_gl000193_random LN:189789 @SQ SN:chr4_gl000194_random LN:191469 @SQ SN:chr7_gl000195_random LN:182896 @SQ SN:chr8_gl000196_random LN:38914 @SQ SN:chr8_gl000197_random LN:37175 @SQ SN:chr9_gl000198_random LN:90085 @SQ SN:chr9_gl000199_random LN:169874 @SQ SN:chr9_gl000200_random LN:187035 @SQ SN:chr9_gl000201_random LN:36148 @SQ SN:chr11_gl000202_random LN:40103 @SQ SN:chr17_gl000203_random LN:37498 @SQ SN:chr17_gl000204_random LN:81310 @SQ SN:chr17_gl000205_random LN:174588 @SQ SN:chr17_gl000206_random LN:41001 @SQ SN:chr18_gl000207_random LN:4262 @SQ SN:chr19_gl000208_random LN:92689 @SQ SN:chr19_gl000209_random LN:159169 @SQ SN:chr21_gl000210_random LN:27682 @SQ SN:chrUn_gl000211 LN:166566 @SQ SN:chrUn_gl000212 LN:186858 @SQ SN:chrUn_gl000213 LN:164239 @SQ SN:chrUn_gl000214 LN:137718 @SQ SN:chrUn_gl000215 LN:172545 @SQ SN:chrUn_gl000216 LN:172294 @SQ SN:chrUn_gl000217 LN:172149 @SQ SN:chrUn_gl000218 LN:161147 @SQ SN:chrUn_gl000219 LN:179198 @SQ SN:chrUn_gl000220 LN:161802 @SQ SN:chrUn_gl000221 LN:155397 @SQ SN:chrUn_gl000222 LN:186861 @SQ SN:chrUn_gl000223 LN:180455 @SQ SN:chrUn_gl000224 LN:179693 @SQ SN:chrUn_gl000225 LN:211173 @SQ SN:chrUn_gl000226 LN:15008 @SQ SN:chrUn_gl000227 LN:128374 @SQ SN:chrUn_gl000228 LN:129120 @SQ SN:chrUn_gl000229 LN:19913 @SQ SN:chrUn_gl000230 LN:43691 @SQ SN:chrUn_gl000231 LN:27386 @SQ SN:chrUn_gl000232 LN:40652 @SQ SN:chrUn_gl000233 LN:45941 @SQ SN:chrUn_gl000234 LN:40531 @SQ SN:chrUn_gl000235 LN:34474 @SQ SN:chrUn_gl000236 LN:41934 @SQ SN:chrUn_gl000237 LN:45867 @SQ SN:chrUn_gl000238 LN:39939 @SQ SN:chrUn_gl000239 LN:33824 @SQ SN:chrUn_gl000240 LN:41933 @SQ SN:chrUn_gl000241 LN:42152 @SQ SN:chrUn_gl000242 LN:43523 @SQ SN:chrUn_gl000243 LN:43341 @SQ SN:chrUn_gl000244 LN:39929 @SQ SN:chrUn_gl000245 LN:36651 @SQ SN:chrUn_gl000246 LN:38154 @SQ SN:chrUn_gl000247 LN:36422 @SQ SN:chrUn_gl000248 LN:39786 @SQ SN:chrUn_gl000249 LN:38502 @PG ID:bowtie2 PN:bowtie2 CL:"/usr/bin/bowtie2-align-s --wrapper basic-0 -x /home/lolab/datasource/bowtie2.hg19/hg19 -S PT078-veh-output/sam/PT078-veh-1.sam -p 6 -1 /media/lolab/MSKCC_Autopsy_Raw/CNV_pipeline-master/PT078-veh/PT078-veh-1/PT078-veh-1_R1_L001.fastq.gz -2 /media/lolab/MSKCC_Autopsy_Raw/CNV_pipeline-master/PT078-veh/PT078-veh-1/PT078-veh-1_R2_L001.fastq.gz" VN:2.3.5.1 @PG ID:samtools PN:samtools CL:/usr/local/bin/samtools view -bS -q 1 -@ 6 PT078-veh-output/sam/PT078-veh-1.sam PP:bowtie2 VN:1.18 @PG ID:samtools.1 PN:samtools CL:/usr/local/bin/samtools sort -@ 6 -o PT078-veh-output/sort/PT078-veh-1/PT078-veh-1.sorted.bam PT078-veh-output/bam/PT078-veh-1.bam PP:samtools VN:1.18 @PG ID:sambamba CL:markdup PT078-veh-output/sort/PT078-veh-1.sort.bam PT078-veh-output/sort/PT078-veh-1.sort.markdup.bam -t 6 PP:samtools.1 VN:1.0 @PG ID:bowtie2-4AEAC2D0 PN:bowtie2 CL:"/usr/bin/bowtie2-align-s --wrapper basic-0 -x /home/lolab/datasource/bowtie2.hg19/hg19 -S PT078-veh-output/sam/PT078-veh-1000.sam -p 6 -1 /media/lolab/MSKCC_Autopsy_Raw/CNV_pipeline-master/PT078-veh/PT078-veh-1000/PT078-veh-1000_R1_L001.fastq.gz -2 /media/lolab/MSKCC_Autopsy_Raw/CNV_pipeline-master/PT078-veh/PT078-veh-1000/PT078-veh-1000_R2_L001.fastq.gz" VN:2.3.5.1 @PG ID:samtools-25F9AFB1 PN:samtools CL:/usr/local/bin/samtools view -bS -q 1 -@ 6 PT078-veh-output/sam/PT078-veh-1000.sam PP:bowtie2-4AEAC2D0 VN:1.18 @PG ID:samtools.1-184FE271 PN:samtools CL:/usr/local/bin/samtools sort -@ 6 -o PT078-veh-output/sort/PT078-veh-1000/PT078-veh-1000.sorted.bam PT078-veh-output/bam/PT078-veh-1000.bam PP:samtools-25F9AFB1 VN:1.18 @PG ID:sambamba-59511C35 CL:markdup PT078-veh-output/sort/PT078-veh-1000.sort.bam PT078-veh-output/sort/PT078-veh-1000.sort.markdup.bam -t 6 PP:samtools.1-184FE271 VN:1.0 @PG ID:bowtie2-6FA4C9DB PN:bowtie2 CL:"/usr/bin/bowtie2-align-s --wrapper basic-0 -x /home/lolab/datasource/bowtie2.hg19/hg19 -S PT078-veh-output/sam/PT078-veh-1001.sam -p 6 -1 /media/lolab/MSKCC_Autopsy_Raw/CNV_pipeline-master/PT078-veh/PT078-veh-1001/PT078-veh-1001_R1_L001.fastq.gz -2 /media/lolab/MSKCC_Autopsy_Raw/CNV_pipeline-master/PT078-veh/PT078-veh-1001/PT078-veh-1001_R2_L001.fastq.gz" VN:2.3.5.1 @PG ID:samtools-348A966B PN:samtools CL:/usr/local/bin/samtools view -bS -q 1 -@ 6 PT078-veh-output/sam/PT078-veh-1001.sam PP:bowtie2-6FA4C9DB VN:1.18 @PG ID:samtools.1-173D7BB8 PN:samtools CL:/usr/local/bin/samtools sort -@ 6 -o PT078-veh-output/sort/PT078-veh-1001/PT078-veh-1001.sorted.bam PT078-veh-output/bam/PT078-veh-1001.bam PP:samtools-348A966B VN:1.18 @PG ID:sambamba-7943C61B CL:markdup PT078-veh-output/sort/PT078-veh-1001.sort.bam PT078-veh-output/sort/PT078-veh-1001.sort.markdup.bam -t 6 PP:samtools.1-173D7BB8 VN:1.0

More for my second question, how can I install the old version or a certain version of AA and AC, if I want to reproduce the same results with my colleague?

Thank you!

Jian

jluebeck commented 2 months ago

Thanks, I believe the most likely reason is that either there is no ecDNA or it is below the detection threshold. Other issues related to the use of scWGS may still be the culprit, but it is hard to say.

For previous version of AA and AC, please see their readme pages. You can download the tagged previous releases there. To install them you would basically use the standalone installation instructions from the AmpliconSuite-pipeline readme and then alter the AA_SRC and AC_SRC variables in your .bashrc file to point to the directories you want.

Thanks, Jens

JFanbio commented 2 months ago

Thank you so much for your patient response! Jian