AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
48 stars 25 forks source link

In most runs AA_CNV_SEEDS.bed files are empty #30

Open Yumo-Xie opened 1 year ago

Yumo-Xie commented 1 year ago

Hi. I have applied the program in WGS data of several cell lines. I started from .fastq files and used PrepareAA.py to generate CNV calls. However, all of these runs generated empty AA_CNV_SEEDS.bed files. I tried the recommended GBM39 testing data [https://www.ncbi.nlm.nih.gov/sra/SRX5055022[accn]]. By this time the program found 2 amplicons (one with EGFR and the other with MYC and PVT1) GBM39_amplicon1.pdf GBM39_amplicon2.pdf. Is the result correct? Does that mean my program work just fine, and the empty AA_CNV_SEEDS.bed files are attributed to the data I used?

jluebeck commented 1 year ago

Hi,

Your GBM39 test results appear correct. An empty seeds bed file implies there are no candidate regions of focal amplification that are detected in those samples. There is also a finish_flag file which you can check to see if AmpliconSuite-pipeline completed successfully.

Thanks, Jens

Yumo-Xie commented 1 year ago

Thank you very much! The program also worked well for COLO320DM. It seems that the empty files are attributed to my data.

jingydz commented 1 year ago

Hi, my output file is also empty. -rw-r--r-- 1 xxx 0 Feb 9 16:50 6605D_AA_CNV_SEEDS.bed And my finish_flag file appears to be running successfully.

$ cat 6605D_finish_flag.txt
All stages completed
$ cat ./6605D_AA_results/6605D_summary.txt
#Amplicons = 0
-----------------------------------------------------------------------------------------
6605D_AA_OUT]$ cat ./6605D_classification/6605D_amplicon_classification_profiles.tsv
sample_name     amplicon_number amplicon_decomposition_class    ecDNA+  BFB+    ecDNA_amplicons

I tried several other WGS files with the same results, without cycle files, png or pdf files, etc. my command is /Parastor300s_G30S/zhangjj/software/miniconda3/bin/python3 /parastor300/work01/zhangjj/software/AmpliconSuite-pipeline/PrepareAA.py -s 6605D -t 50 --cnvkit_dir /parastor300/work01/zhangjj/software/cnvkit/cnvkit.py --bam 6605D.bam --ref GRCh38 --downsample 10.0 -o 6605D_AA_OUT --run_AA --run_AC My WGS data is 30X. Is this problem due to downsampling to 10x or something else? ps. My data are from healthy people, not cancer patients.

jluebeck commented 1 year ago

Hi,

Your outputs appear to be correct. Keep in mind that focal amplifications almost exclusively occur in cancer and pre-cancer samples. If you are providing samples from healthy patients to AmpliconSuite, and it does not find any focal amplifications, then this is completely expected.

If you would like to try a cancer cell line, I suggest you try COLO320DM.

Thanks, Jens

jingydz commented 1 year ago

Thanks, I have run the WGS data of 39 healthy people and got 4 files AA_CNV_SEEDS.bed with content so far. image I also tried the COLO320DM cancer cell line, and it did find a lot of focal amplification, which should indeed be the problem with my data, thank you. time /Parastor300s_G30S/zhangjj/software/miniconda3/bin/python3 /parastor300/work01/zhangjj/software/AmpliconSuite-pipeline/PrepareAA.py -s COLO320DM -t 10 --cnvkit_dir /parastor300/work01/zhangjj/software/cnvkit/cnvkit.py --fastqs COLO320DM_r1.fastq.gz COLO320DM_r2.fastq.gz --ref hg38 -o COLO320DM_AA_OUT --run_AA --run_AC image

iamyingzhou commented 1 year ago

Hi,

Your outputs appear to be correct. Keep in mind that focal amplifications almost exclusively occur in cancer and pre-cancer samples. If you are providing samples from healthy patients to AmpliconSuite, and it does not find any focal amplifications, then this is completely expected.

If you would like to try a cancer cell line, I suggest you try COLO320DM.

Thanks, Jens

Dear Jens, Is it possible to detect extrachromosomal circular DNA (eccDNA) in plasma samples from patients with specific chronic diseases? Thanks!

jluebeck commented 1 year ago

Hi Yingzhou,

AA is designed to detect large (>10kbp), focally amplified ecDNA. If the eccDNA in question are smaller, or if they are not amplified then AA will very likely not detect them.

Thanks, Jens