bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
992 stars 182 forks source link

About the '--masking 0' option #787

Open graa4 opened 4 months ago

graa4 commented 4 months ago

Hi, I'm curious about why, despite setting --masking 0, the sseq output still contains many results with 'XXXXXXXX'. like: AERLDEVAAQRHCLTDRFHGGGQGGIGTGELLEREPXRLDHHVVQGGFETGRRFPCDVVDDLVEGVTDGQFGGDLGDRKAGRLGRQXXGTRHXRVHLDDDQPTVARVDRELDVAAAGVHTHLAQDRDAQVAHPLVFXVGQRHRXXXXXXXXXXHTHRVDVFDRAHHHHVVVAVAHQLEFEFLPAVNRFLDEHVGAGR-GRQPXXXXXXXXVGGVRYPRTQPAHGEARPXXXXXXXXXDRLTHFGXGETHSAPGGFATGLGXDVLEPLPVLAXLDGVXXXADEFHAVLFQHPALVQRDRGVQRGLPTQGRQQGVDLVAPLGLLGDNPLHERRGDGLYVGVVGELRVGHDGGRIRVHQADLQALGAQHPARLSPXVVELARLADDDRPGXXDQHVVXIGATGH Thank you!

bbuchfink commented 4 months ago

I could not reproduce the issue when using --masking 0. Please double check that your input sequences don't already contain the X.

graa4 commented 4 months ago

Sorry, I just saw your reply now. The amino acid in the sequence I entered was originally U, but Diamond was automatically replaced with X, it looks like this: VERFLEGSADGHRLAHGLHRSGKEILRPGKLFKREPGHFHHAVIDGGLERSPGLPGDVVGDLVQGIPHGQLGGDLGNGKPRRLGCEGRXPGDPGVHLYDDHLPVGGVDGELDVGPPRLHADFPQNRNRGVPQQLIFPVGQGLGRSHRDRIPRMDAHGVHVLDGADDDHIVHAVAHDLELEFLPAEHRLL-EHDGVNETGIQPALGQFLQFFPVVGHAAPRAAQRERRPHDDRETDLPGNGFHFRHGTRNAAGRNAQPDPLHGIAEQFPVFGFLDDFNTRSDESHAETFEHTRFGHAHRHVQGRLPAQGGQQRVGTL-PL----DHLRHRFGRDRLDIGAVGRFRIGHYCCGVAVDQDNLVPFLAQCLAGLGPGVVELARLADDDGAGSDDQYLSYVGSLGH