LaurieLecomte / SVs_long_reads

SV calling pipeline from ONT data
2 stars 0 forks source link

Senquence was truncated #2

Open C-YONG opened 2 months ago

C-YONG commented 2 months ago

Sorry to bother you again, after using the ONT data to call the structure variation, we modified the VCF content with format_add_ALTseq_LR.R. The following error occurred, so we checked the VCF file, and we found that there were three insert variant ends beyond the length of the chromosome, is this a normal result? Can we just delete these three sites?In addition, we found that these three variations all come from nanovar, which indicates that the accuracy of this software is problematic? image image

LaurieLecomte commented 2 months ago

Hi, I don’t remember encountering SV calls for which END positions were beyond chromosome length.

I assume that the excerpt you provided comes from the refined VCF obtained at the 03.2_nanovar_refine.sh. Could you please check the END positions of the three problematic variants in the original, unrefined and unformatted VCF outputted by 03.1_nanovar_call.sh? (grep 'NV.INV.SV103607-6E7A9' $CALLS_DIR/nanovar/$SAMPLE/$SAMPLE.nanovar.total.vcf).

If the END positions reported in the original VCF for the three sites are also larger than chromosome length, then we will know that the issue likely does not come from formatting scripts and commands used throughout the pipeline.

If you plan on combining the outputs of multiple SV callers and using an external genotyper to accurately genotype SVs, then my suggestion would be to simply discard the problematic sites outputted by NanoVar.