EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

Smrtsv failed on polishing Pacbio CCS data in the assemble stage #57

Closed LYC-vio closed 2 years ago

LYC-vio commented 2 years ago

Hi,

Accroding to the error message from smrtsv2, Arrow algorithm only works on polishing non-CCS data (this is also mentioned here). Would you please give some suggestions on using smrtsv2 on CCS datasets?

Best regards

wharvey31 commented 2 years ago

Hello,

SMRT-SV was designed specifically with CLR reads in mind. As such, tuning and parameterization might yield incorrect results even if you were to replace the assembly and polishing steps with something like hifiasm and racon, respectively.

If you have CCS data and want to call structural variation using assembly sequence, I would recommend first assembling with hifiasm (https://github.com/chhylp123/hifiasm) and then using PAV (https://github.com/EichlerLab/pav).

Good luck!

paudano commented 2 years ago

Yes, SMRT-SV is essentially end-of-life. The SMRT-SV local assembly method was very helpful at the time, but as William was saying, we have better ways using whole genome assemblies (phased or unphased). If I did take the time to update it for HiFi, I still don't think it would be a useful validation-only callset (I would use other modern variant discovery tools for that).

We most often use:

The latest version of hifiasm has a mode to generate phased assemblies using only HiFi reads. It generates two GFA files, which you can turn into FASTA files and pipe directly into PAV or SVIM. pbsv, Sniffles, and DeepVariant are based on read alignments to the reference (no assembly step).