Getting a std::out_of_range error

GfellerLab / MixMHC2pred

HLA-II ligand predictor.

Other

35 stars 10 forks source link

Getting a std::out_of_range error #3

Closed BioinformaNicks closed 4 years ago

BioinformaNicks commented 4 years ago

Hi,

I was testing your program with a complete proteome against just a randomly picked allele, and I'm getting this error. Any idea on why, or how to fix this? It works well with just a single protein.

Thanks in advance!

jracle85 commented 4 years ago

Hi, Thank you for your message.

It seems from what you write that you are using a fasta file containing all proteins from your proteome? If this is the case, the problem is that MixMHC2pred should have as input a list of peptides (it can be given as txt file or .fa file) not proteins: even when using .fa format you should give peptide sequences as input not the full sequences of proteins. I think that this is the error that you are encountering and you should thus split your proteins in shorter fragments (e.g. 15-mers, best would be to have overlapping peptides). Note that in a later version of MixMHC2pred we might have an option to do this cutting automatically but at the moment you need to do it by yourself.

Another possibility if you already had peptides given in the input is that there are too many peptides if you want to try for a full proteome... So you should try to split your file in smaller ones if this is the case.

Best wishes, Julien

BioinformaNicks commented 4 years ago

Hi,

Firstly, thank you for your quick response! I did indeed read on the tool for MHC-I that the proteins should be cut into 8-14mer peptides, and was already wondering whether the same applied to this tool for MHC-II. From what you write it seems it does. Is there a minimum and maximum length for the peptides for the MHC-II tool? Then I can write a script that cuts my proteome in smaller peptides.

Thanks again for the answer!

Kind regards,

Nick

jracle85 commented 4 years ago

Hello,

The current minimum length is 12 aa, shorter ones will not crash but return NA values. There is no real upper bound (except for the out-of-range exception that you got), but peptides longer than maybe 20-25 will have a very bad score and will never really be seen as binder.

I'd advice either making that your peptides are all 15-mers or use sizes between 13-18 which are those most often observed in MS data. Note that in most cases, if you have a 15-mer with a good score, you'd also find some overlapping 13-18 mers that would also have a good score, that's why it might be sufficient to use only 15-mers (but there are of course some exceptions and then, regarding the T-cell recognition is another question).

Best wishes,

Julien

BioinformaNicks commented 4 years ago

Dear Julien,

Thanks for your time! We're probably going to stick with 9-10mer for MHC-I and 15-mer for MHC-II as those are indeed most often observed in MS data.

Again, thanks a lot!

Kind regards,

Nick