AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
48 stars 25 forks source link

run Ampliconsuite-pipeline on WGBS data #51

Open QLZhouBio opened 5 months ago

QLZhouBio commented 5 months ago

Thanks for developing this tool for ceDNA analysis. I have some bam files from generated from WGBS data and am wondering if AmpliconSuite is able to use these bam files as input for ceDNA detecion? If not, is there any other tools could be used? Thanks.

jluebeck commented 5 months ago

Hi, thanks for this question. AmpliconSuite-pipeline is not designed for analysis with whole genome bisulfite sequencing, only paired-end whole genome sequencing. I am not currently aware of any existing tools that take WGBS as input and provide ecDNA predictions.

Jens

QLZhouBio commented 5 months ago

Hi, thanks for this question. AmpliconSuite-pipeline is not designed for analysis with whole genome bisulfite sequencing, only paired-end whole genome sequencing. I am not currently aware of any existing tools that take WGBS as input and provide ecDNA predictions.

Jens

Thanks for the prompt reply. Is there any particular reason that Ampliconsuite is not able to process WGBS data? I think it will be quite interesting to ultilize WGBS data for ecDNA detection.

jluebeck commented 5 months ago

I am not an expert in WGBS, however the conversion of unmethylated cytosine to uracil, and subsequently to thymine will likely reduce the quality of alignments, creating challenges in both SV detection and CN calling. Another concern is that the WGBS protocol would cause uneven coverage due to possible bias in fragment selection.

If the reads are PE, you are welcome to try and see what happens. If you had any way to convert the altered thymine bases in the reads back to the reference allele (pseudo-wgs) then you might be able to do something with the data.

Jens

jluebeck commented 5 months ago

You can get fairly good quality focal amplification calls from cheap low-pass WGS (1x coverage) if generating additional data is an option available to you.

QLZhouBio commented 5 months ago

I am not an expert in WGBS, however the conversion of unmethylated cytosine to uracil, and subsequently to thymine will likely reduce the quality of alignments, creating challenges in both SV detection and CN calling. Another concern is that the WGBS protocol would cause uneven coverage due to possible bias in fragment selection.

If the reads are PE, you are welcome to try and see what happens. If you had any way to convert the altered thymine bases in the reads back to the reference allele (pseudo-wgs) then you might be able to do something with the data.

Jens

Hi Jens, thanks for the clarificaiton. I would like to try to use the bam files from the bismark pipeline as input to run the AmpliconSuite and see how it goes. At the meantime, might I double check with you that whether the hg38 reference genomce fa file is used for seed intervals selection section or any other downstream step after bwa mapping step? If so, probably it might be a potential issue since, as you indicated, after conversion a lot of C become T in the bam files.

BTW, since we have genreated a database with a few thousend WGBS samples, I think it would be very difficult to re-do WGS with a shadow sequencing again:)

jluebeck commented 5 months ago

Hi, the pipeline is primarily going to use mapping quality scores from the bam. These are based on how well the read aligned to the reference. This is used in both SV and CN detection. So in a sense the reference is used in all stages of the pipeline.

I am guessing you may easily find some CN seeds but have a difficult time recovering SVs. No clue how even coverage is for WGBS is and if it is not even this will be a big problem.

If you can find a tool that takes a WGBS bam and converts the TG basepairs back to CG where appropriate then you may have more luck. Such a tool may not exist.

Jens

QLZhouBio commented 4 months ago

Hi, the pipeline is primarily going to use mapping quality scores from the bam. These are based on how well the read aligned to the reference. This is used in both SV and CN detection. So in a sense the reference is used in all stages of the pipeline.

I am guessing you may easily find some CN seeds but have a difficult time recovering SVs. No clue how even coverage is for WGBS is and if it is not even this will be a big problem.

If you can find a tool that takes a WGBS bam and converts the TG basepairs back to CG where appropriate then you may have more luck. Such a tool may not exist.

Jens

Hi Jen, thanks for your kindly clarificaiton. Actually I had a few wgs and wgbs data from same samples and I ran Ampliconsuite with these data. Seems like it could find similar amplicons in both wgs and wgbs data with AA pipeline, however, in AC output it is not the same. Is there any email address I could share these AA and AC output with you so that you could have a closer look on it? thanks.

jluebeck commented 4 months ago

Sure - please feel free to share it to jluebeck [at] ucsd.edu