cmbi / mrs

Maarten's Retrieval Service
Boost Software License 1.0
0 stars 3 forks source link

Blasting doesn't support non-standard amino acids #45

Closed jonblack closed 7 years ago

jonblack commented 7 years ago

MRS returns the following error to HOPE when blasting swissprot with the sequence shown below:

Query contains invalid characters

The web interface provides more information when you try to blast the same sequence:

Query contains invalid characters: 'O'

MEFVALGGPDAGSPTPFPDEAGAFLGLGGGPRTEAGGLLASYPPSGRVSLVPWADTOTLGTPQWVPPATQMEPPHYLELLQPPRGSPPHPSSGPLLPLSSGPPPCEARECVNCGATATPLWRRDGTGHYLCNACGLYHRLNGQNRPLIRPKKRLLVSKRAGTVCSNCQTSTTTLWRRSPMGDPVCNACGLYYKLHQVNRPLTMRKDGIQTRNRKVSSKGKKRRPPGGONPSATAGGGAPMGGGGDPSMPPPPPPPAAAPPQSDALYALGPVVLSGHFLPFGNSGGFFGGGAGGYTAPPGLSPQI

O is the letter for Pyrrolysine, a non-standard amino acid. Should MRS support non-standard amino acids or is it correct to reject the query? Perhaps it should be an option? If not, the API error message should match the error message shown in the web interface.

cbaakman commented 7 years ago

It seems that the ncbi blast executable has no problems with it, so MRS should also support 'O'. We should be able to give 'O' some score values. Need to discuss this with Gert.

MRS uses the matrices mentioned here: https://github.com/cmbi/mrs/blob/master/src/M6Matrix.cpp

These matrices are defined without 'O", so I wonder how ncbi blast handles it.