Request for benchmarking dataset

aarthi-mohan commented 11 months ago

I am setting up the AmpliconSuite-pipeline on our HPC for a prospective project. Would you be able to share the "benchmarking dataset with simulations" mentioned in your manuscript? It will be helpful to test and validate our setup on a truth set.

Thank you, Aarthi

jluebeck commented 11 months ago

Thanks Aarthi, this is a good suggestion. The simulated data is very large, which is why we instead provide the scripts for generating the simulated data, however that is not necessarily the best benchmarking/testing data since we have a number of outputs from real cancer cell lines of different focal amplification types hosted on SRA that users can test instead.

I will put together a couple more formally documented examples of AA outputs for users to check the validity of their installations. For the time being, I can recommend using this GBM39 (aka FF-8) example from SRA as a testing dataset for a simple ecDNA in a cancer cell line.

When run through the workflow, the resulting AA sashimi plot should like like this (hg19 version), with an ecDNA classification from AC: FF-8_amplicon1 Note this is the figure from the original AA manuscript so the newer versions of AA may differ in appearance very slightly. Also note that an hg38-aligned version of this analysis if you choose it will look slightly different and will use different coordinates.

Thanks, Jens

aarthi-mohan commented 11 months ago

Thanks so much for detailed info Jens, I was able to obtain similar result with hg38 too. I am using AS with default settings on non cancer data to explore presence of ecDNA. Is there any settings you recommend that needs to be changed from default? Especially, should I modify the BED file to AA or change the filter values?

Thanks, Aarthi

jluebeck commented 11 months ago

Hi Aarthi,

Glad to hear it is working for you. For non-cancer data, the main questions to answer before determining if AA will be appropriate is whether the ecDNAs are amplified or not, and then how large the ecDNAs are. If they are not amplified, or they are below 10kbp in length, AA will not be able to detect them reliably regardless of any parameters changes. AA is only designed for larger, amplified ecDNAs found in cancer. Others doing research on non-cancer ecDNAs have had more luck for unamplified and small ecDNAs using isolation protocols like Circle-Seq.

Thanks, Jens

jluebeck commented 7 months ago

Testing dataset added in 1.1.0 https://github.com/AmpliconSuite/AmpliconSuite-pipeline#testing-your-installation

AmpliconSuite / AmpliconSuite-pipeline

Request for benchmarking dataset #43