KolmogorovLab / Severus

A tool for somatic structural variant calling using long reads
Other
84 stars 1 forks source link

Discrepancy in DEL Calls Between Severus Nanopore and PacBio Datasets for COLO829 Cell Line #13

Open skeremaydin opened 3 months ago

skeremaydin commented 3 months ago

Dear Sir/Madam,

I am currently testing the Severus tool for analyzing structural variations (SVs) in two datasets of the COLO829 cell line: one generated by Nanopore sequencing and the other by PacBio sequencing. While most SV types show consistent results between the two datasets, there is a noticeable difference in the number of deletions (DEL) called by Severus.

TOOL | SEVERUS R9 |   Type | INS | DEL | INV | DUP | BND |   Somatic | 116 | 1439 | 16 | 11 | 132 | =total 1714   |   |   |   |   |   |     |   |   |   |   |   |     |   |   |   |   |   |   TOOL | SEVERUS PACBIO |   Type | INS | DEL | INV | DUP | BND |   Somatic | 20 | 43 | 10 | 8 | 28 | =total 109

Analysis Parameters:

The parameters used for SV calling in both runs are identical. The same cell line (COLO829) was used for both Nanopore and PacBio datasets.

Request for Insight:

I would appreciate any insights into why there is such a discrepancy specifically in the number of deletion calls between the two datasets. Since the parameters and the cell line are the same for both runs, I expected more consistency in the results.

aysegokce commented 3 months ago

Hello @skeremaydin, We tested Severus for HiFi and ONT data from the same samples, including COLO829, and we got high consistency.

Can you share severus logs?

Thanks, Ayse

skeremaydin commented 3 months ago

Thank you for your prompt response. severus_hifi.log severus_r9nanopore.log here are the log files.

Best Regards,

aysegokce commented 3 months ago

Thank you, Kerem. It looks like it is old R9 data (from the error profile). For the comparison, here is R9 data from 2021 (the one we used for the benchmarking):

Read error rate (Q25 / Q50 / Q75): 0.0264 / 0.0387 / 0.0624 Read mismatch rate (Q25 / Q50 / Q75): 0.0075 / 0.0119 / 0.0208

I assume most of those deletions are just read errors. Severus is optimized mainly for the last versions of R9 and R10 for ONT data. We will put the COLO829 data to SRA in a couple of weeks along with other cell lines we have sequenced. Hope that helps.

Best Ayse

skeremaydin commented 3 months ago

Thank you. I will check it again. My data were ERR2752452 and ERR2752451 obtained from https://www.ebi.ac.uk/ena/browser/view/PRJEB27698. The quality of the data was quite okay actually. It is nice to hear that you will upload your COLO829 data to SRA. I appreciate. Best regards.

skeremaydin commented 3 months ago

If you've published your R9 data SV results (particularly the COLO829 dataset since it has truthset data) I'd be honored to use them as a reference and cite it in my research. If available, could you share the publication or access information ?

aysegokce commented 3 months ago

Merhaba Kerem, Here are the SRA accession IDs for the COLO829 data: SRR28305188 (COLO829) and SRR28305176 (COLO829BL), along with other cell lines under PRJNA1086849. Best Ayse

skeremaydin commented 3 months ago

Hi @aysegokce, Thank you for providing the data accession IDs and information regarding your work. Your research appears to be significant and valuable. Wishing you continued success. I appreciate.

Best Regards Kerem