HKU-BAL / ClusterV

ClusterV: finding HIV quasispecies and drug resistance from ONT sequencing data
BSD 3-Clause "New" or "Revised" License
10 stars 0 forks source link

Not detecting mutations which can be seen in IGV #2

Open mbdabrowska1 opened 7 months ago

mbdabrowska1 commented 7 months ago

Hi, I have an issue with ClusterV sometimes not detecting mutations that can be seen in IGV and are present at a relatively high allele frequency. In the following examples you can see that the mutation is clearly visible in IGV, but in the ClusterV report it doesn't seem to be called:

BARCODE19

RT:V106I mutation expected (GTA -> ATA at nucleotide 2411, NC_001802.1 reference)

Screenshot from 2024-04-29 10-06-07

And the corresponding report: Screenshot from 2024-04-29 10-20-09

The coverage around the region isn't very high. Could this be causing the issue? I feel like it should still be seen in consensus unless I'm misunderstanding how Flye assembles the fragment. Screenshot from 2024-04-29 10-10-18

BARCODE25

RT:E138A mutation expected (GAG -> GCG mutation at nucleotide 2508, NC_001802.1 reference)

Screenshot from 2024-04-29 10-15-26

Corresponding report: Screenshot from 2024-04-29 10-20-45

And coverage: Screenshot from 2024-04-29 10-16-12

BARCODE43 - separate run

RT:V106I mutation expected (GTA -> ATA at nucleotide 2411, NC_001802.1 reference):

Screenshot from 2024-04-29 10-18-54

Report: Screenshot from 2024-04-29 10-21-17

Coverage: Screenshot from 2024-04-29 10-19-12

Any help with this would be greatly appreciated! Please let me know if you require the original files as I'm happy to share those via email.

sujunhao commented 6 months ago

Hi, The missing variants with high depth from output may have multiple causes.

It may be from (1) the missing calling from the variant caller, Clair-Ensemble model trained at Guppy5 data in ClusterV; (2) the read with variants are filtered, the original bam filtering reads with large indel are filtered in ClusterV, and the filtering process may filter read with your mentioned variants. the filtered file is in [YOUR INPUT FILE NAME]_f.bam.

For issue (2), adjusting the filtering setting in --indel_l may solve the issue. For issue (1), we have extensively tested ClusterV to avoid this situation happening, however, when using data in different chemistry or from different basecalling from ONT data, the problem may exist. In this case, we need time and effort to evaluate and further adjust our variant calling model.

In case the adjustment of filtering does not solve the problem, Could you please share your files with me for further testing on my side? You can send it to my email, jhsu@connect.hku.hk, if needed.

Regards, JH