Parsoa / SVDSS

Improved structural variant discovery in accurate long reads using sample-specific strings (SFS)
MIT License
42 stars 4 forks source link

specific string to the identifiers output for aggregate #3

Closed jo-mc closed 2 years ago

jo-mc commented 3 years ago

After running the example data, and looking at the aggregated ouput, The specific string and associated reads, sometimes do not match with the data/ID's in child.fq ? is this correct?

Matching case: string in read_ids_aggregated.fasta: TGCCAGGAA ID: m54329U_190619_052546/165412986/ccs$

Here we find a match in child.fa 1 @m54329U_190619_052546/165412986/ccs 2 CCATCTCAAAAAATCAATCAATCAATAAATCAATACATA............

Non-matching case; string in read_ids_aggregated.fasta: CATGGGAGC ID's: m54329U_190629_180018/58722384/ccs$m54329U_190617_231905/26280060/ccs$

Here we do not find the expected matching read ID's child.fa for string, but different ID's do match: 1 @m54329U_190619_052546/165412986/ccs 2 CCATCTCAAAAAATCAATCAATCAATAAATCAATACAT................... 29 @m54329U_190615_010947/134415974/ccs 30 GTAGGGAACACAGTCGGGCTAGAAAGTCCATTGACCACTCAGGGCCAT.....................

ldenti commented 3 years ago

Hi, in the aggregated results we report the canonical version of the strings we found (canonical version: lexicographical minimum between the string and its reverse-and-complement). For this reason you cannot find that string but actually you can find its reverse-and-complement (GCTCCCATG).

Thanks for pointing this out btw: we'll better specify it in the README.

ldenti commented 2 years ago

Closing.