a-ludi / dentist

Close assembly gaps using long-reads at high accuracy.
https://a-ludi.github.io/dentist/
MIT License
47 stars 6 forks source link

Dentist parameters #42

Open aureliendejode opened 1 year ago

aureliendejode commented 1 year ago

Hi, I managed to run the pipeline but know I am having a closer look at the options. My goal is to be conservative with this: I ma happy to fill gaps when there is strong evidence for it but I don't want to mess up the assembly because it is pretty good overall I think.

From what I gathered form the github, --read-coverage combined with the ploidy is one very important option. Do you have any recommendation about how to obtain the coverage value ? Is there recommended tools for this ?

I am thinking I should not play much with the other parameters as I have CLR data.

Any advice on this ?

Best Aurélien

a-ludi commented 1 year ago

You are lucky! Your goal is exactly DENTIST's main goal: be conservative, close gaps only with certainty. That's why the default configuration is already aimed to be conservative. Still, to be absolutely certain about the quality of the insertions you can map the reads to the gap-closed assembly using e.g. minimap2 and visualize each closed gap with the help of the *.closed-gaps.bed file which shows the loci of the closed gaps.

Regarding the read coverage, there is already advice in the README under How to Choose DENTIST Parameters:

Ideally, the user provides the haploid read coverage which, can be inferred using a histogram of the alignment coverage across the assembly. Alternatively, the average raw read coverage can be used which is the number of base pairs in the reads divided by the number of base pairs in the assembly.

In my experience, the second method gives already satisfactory results. It is actually not quite clear whether a more accurate estimate of the read coverage provides better performance in gap closing.

I would also invite you to play with other parameters based on the guide in the README if you are not happy with the result. Especially, --max-insertion-error and --join-policy could be interesting to play with.

aureliendejode commented 1 year ago

Hi, Thanks for this. I have produce the alignments with minimap2 and visualizing them using IGV. I am not sure how to check if the gaps are properly closed. Here is what I thought based on your recommendation: Check the positions indicated in the bed files. For those should I look at the coverage and how well the reads align ? Is there anything else. I think around 700 gaps have been closed according to the bed file. SO that is a lot to check manually is there another way to be sure about those ?

aureliendejode commented 1 year ago

Hello,

In the validation-report.json, what are the meaning of the different terms ? (I pasted one line of the file at the bottom of the message. In the bed file I have 717 lines and in the validation report 485 lines and all of them have this "isValid":false. Does that mean that the validation-report.json only contains the invalid gap ? and the bed file all of them ? Or did something go wrong ?

{"spanningReadIds":[3220364,4064883,4417687,6205668,8191962,8278500,8616546,10513266,11032877,14762477,14762478,15227433,17797189,19431685,22248792], "isValid":false, "contigIds":[108,109], "numSpanningReads":15, "region":{"tag":104,"contigId":104,"begin":21824,"end":22156}, "weakCoverageMaskBps":1964, "regionWithContext":{"tag":104,"contigId":104,"begin":20824,"end":23156}, "consensusReadIds":[1238256,3220364,3222420,3726037,4064883,4417687,8191962,8278500,8616546,10513266,10841984,11032877,11051323,11119214,11538615,14677272,14762477,14762478,15227433,15825890,17797189,19070061,20192883,21507486,21507487,22248792]}