J35P312 / TIDDIT

TIDDIT - structural variant calling
Other
10 stars 0 forks source link

Question: BWA index required #15

Closed FriederikeHanssen closed 2 years ago

FriederikeHanssen commented 2 years ago

Hi,

I am working on adding tiddit sv to nf-core/sarek. I just realised that the bwa index appears to be required. However, we support different mappers. How would that effect the tool, i.e. bwamem2 index or dragmap index?

J35P312 commented 2 years ago

Hello! Happy to hear that! bwamem2 index should work fine! Not sure about dragmap, tell me if you get troubles! I have been considering adding a flag for disabling the assembly module, that could be a solution

FriederikeHanssen commented 2 years ago

ok, follow up question: Does the index need to be provided with a specific flag? I put it in the same folder as the fasta but I keep getting the error:

error, The reference must be indexed using bwa index
J35P312 commented 2 years ago

It should not be needed! Essentially, tiddit will run BWA mem with the following command:

(line 119 tiddit_contig_analysis.pyx)

os.system("{} mem -x intractg {} {}_tiddit/clips.fa.assembly.clean.mag 1> {}_tiddit/clips.sam 2> /dev/null".format(args.bwa,args.ref,prefix,prefix))

but I have made some checks were TIDDIT looks for the following index files to make sure that the index is there:

if not os.path.isfile(args.ref+".bwt") and not os.path.isfile(args.ref+".64.bwt"):

Can you send me, the filenames of your index? We will probably solve this by adding another file to this check =P

maxulysse commented 2 years ago

BWA:

╰─≻aws s3 --no-sign-request --region eu-west-1 ls s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/BWAIndex/    
2018-11-16 16:22:37     487553 Homo_sapiens_assembly38.fasta.64.alt
2018-11-16 16:26:30      20199 Homo_sapiens_assembly38.fasta.64.amb
2018-11-16 16:22:37     455474 Homo_sapiens_assembly38.fasta.64.ann
2018-11-16 16:26:31 3217347004 Homo_sapiens_assembly38.fasta.64.bwt
2018-11-16 16:26:54  804336731 Homo_sapiens_assembly38.fasta.64.pac
2018-11-16 16:27:26 1608673512 Homo_sapiens_assembly38.fasta.64.sa

BWAmem2:

╰─≻aws s3 --no-sign-request --region eu-west-1 ls s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/BWAmem2Index/
2022-05-23 09:41:11 6434693834 Homo_sapiens_assembly38.fasta.0123
2022-05-23 09:41:10      20199 Homo_sapiens_assembly38.fasta.amb
2022-05-23 09:41:10     455474 Homo_sapiens_assembly38.fasta.ann
2022-05-23 09:41:11 10456377594 Homo_sapiens_assembly38.fasta.bwt.2bit.64
2022-05-23 09:41:11  804336731 Homo_sapiens_assembly38.fasta.pac

DRAGMAP:

╰─≻aws s3 --no-sign-request --region eu-west-1 ls s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/dragmap/
2022-05-23 09:39:52     622009 hash_table.cfg
2022-05-23 09:39:52     147575 hash_table.cfg.bin
2022-05-23 09:39:53 4643394934 hash_table.cmp
2022-05-23 09:39:52      15166 hash_table_stats.txt
2022-05-23 09:39:54   49238272 ref_index.bin
2022-05-23 09:39:54 1575622656 reference.bin
2022-05-23 09:39:53  393905664 repeat_mask.bin
2022-05-23 09:39:54  111013696 str_table.bin
J35P312 commented 2 years ago

Thanks! I will have a look at the BWAmem2 index, and see if tiddit works with it; if it does, I will make a quick update for that.

I'm certain that TIDDIT wont work with the DRAGMAP indexes; sorry!

FriederikeHanssen commented 2 years ago

Is the assembly improving the Sv calls in comparison to the previous version or what is it used for? Sorry, i don't really know how the tool works under the hood

J35P312 commented 2 years ago

No worries, I'm happy you are interested in the algorithm!

In particular, the assembly improves the performance on calling of Small SV (50-300, i.e smaller than the insert size). On such small SV, we wont get any signal from discordant read pairs. Old TIDDIT would only be able to detect those SV based on split reads, which may not be present at repetitive breakpoints.

The contigs also helps us to get the most accurate estimate of the breakpoint position; and we can use the contigs themselves to design ddPCR probes for instance.

J35P312 commented 2 years ago

Hello again! I gave it a try, sadly bwa mem wont work with the bwa-mem2 index: i.e the reference must be indexed using bwa to run the tiddit local assembly. I have now made a new release: 3.1.0, where you can turn of the assembly module ("--skip_assembly"), then there wont be any need for providing bwa indexed reference.

I recommend running local assembly if possible; but turning it off may be a nice option in some cases.

Have fun! //jesper

FriederikeHanssen commented 2 years ago

Thank you for adding the additional flag and answering all our questions. Will close the issue since everything is worked out now 😁