Rappsilber-Laboratory / XiSearch

XiSearch
Apache License 2.0
9 stars 7 forks source link

How to encode non-canonical amino acids into search? #90

Open ciancone94 opened 1 year ago

ciancone94 commented 1 year ago

Hello,

A collaborator asked me how to encode a non-canonical amino acid into the search. This same amino acid would also be the one that crosslinks. Let's call the amino acid 'X', with a mass difference to the standard amino acid (e.g., D)of 50 Da. Is it possible to feed XiSearch a fasta with the mass difference of the standard amino acid to the newly incorporated one? This would just be for one protein, not for the whole proteome. For example:

Sequence: ASDFK, Modified sequence: ASXFK

Can I upload a fasta with: ASD(+50)FK? Is there a format I need to follow? Francis seemed to recall being able to hard-code acetylation sites on XiSearch, but he can't remember how he did this.

Otherwise, would I just search all the 'D' residues to have a modified mass of 50?

Thanks,

Anthony

grandrea commented 1 year ago

Hey,

Follow the rules for site-specific modifications here https://github.com/Rappsilber-Laboratory/xisearch#modification-settings

In short, put in your fasta something like this for a variable modification. Remove parenthesis in the fasta for having mod as a fixed modification instead. Mod names are arbitrary but have to be lowercase.

ASD(mod)FK

and then in the .config, define the modification with the deltamass relative to the unmodified amino acid.

modification:known::SYMBOLEXT:mod;MODIFIED:D;DELTAMASS:50
ciancone94 commented 1 year ago

Not sure how I missed that, thanks for your help!

lutzfischer commented 12 months ago

if you want to have it site specific you can also encode it in the fasta-file

cxdummies commented 5 months ago

Can one then make use of this specific modified amino acid in other setting lines? For example, would the following lines work?

crosslinker:AsymetricSingleAminoAcidRestrictedCrossLinker:Name:Linker;MASS:123.45678;FIRSTLINKEDAMINOACIDS:*;SECONDLINKEDAMINOACIDS:E,D,Dmod

digestion:PostAAConstrainedDigestion:DIGESTED:D,Dmod;ConstrainingAminoAcids:;NAME=Enzyme

loss:AminoAcidRestrictedLoss:NAME:Loss;aminoacids:Dmod;MASS:123;cterm

Additionally, can one create a fixed modification on this modified amino acid? Would this line work?

modification:fixed::SYMBOL:Dmodabc;MODIFIED:Dmod;MASS:100

grandrea commented 5 months ago

Sorry I don't understand. Is the non canonical amino acid also a crosslinker, or just a different amino acid?

grandrea commented 5 months ago

by default crosslinker that crosslink to D will also crosslink to Dmod as far as I understand.

cxdummies commented 5 months ago

The question is, in general, if I define a modified amino acid in the fasta sequence, do I have to add this specific modified amino acid to the settings of an enzyme, crosslinker and fixed/variable modifications?

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences?

for instance

ProtA AnCnDn ProtB ACD ProtC EFG

grandrea commented 5 months ago

there is no general answer to this question, is what I am trying to reply- it kind of depends what you want to do.

Labelling is typically not defined as a modification but using the label word https://github.com/Rappsilber-Laboratory/XiSearch?tab=readme-ov-file#isotope-labelling

The label word will search every amino acid as heavy or light version of itself (or whatever custom deltamass you give with the list). So my suggestion would be

 LABEL:HEAVY::SYMBOL:Dn15;MODIFIED:D;MASS:116.023978035

If instead you really want to define only a single protein as 100% labelled, I think you are going about it the right way. The crosslinker will react with the modified amino acid, but if you use a protease that cuts at that amino acid i don't know @lutzfischer may clarify this also for losses.

For modifications defined in fasta, you should use the known modifications, not fixed (again see near the end of https://github.com/Rappsilber-Laboratory/XiSearch?tab=readme-ov-file#modification-settings )

modification:known::SYMBOLEXT:ph;MODIFIED:S;DELTAMASS:79.966331

for a fasta like

ACKASphAK

No brackets in the sequence for a fixed modification.

as an aside, I suggest using the DELTAMASS and SYMBOLEXT nomenclature to use unimod modification masses rather than total masses https://github.com/Rappsilber-Laboratory/XiSearch?tab=readme-ov-file#modification-settings

grandrea commented 5 months ago

I see now with the fixed modification on a site specific modified AA. Again i don't know sorry. I will test because I am also curious. With label it works

cxdummies commented 5 months ago

Is it permitted to define multiple modified amino acids on one line? Is it necessary to list each modified amino acid on a separate line?

modification:known::SYMBOLEXT:ph;MODIFIED:S,T;DELTAMASS:79.966331

or

modification:known::SYMBOLEXT:ph;MODIFIED:S;DELTAMASS:79.966331 modification:known::SYMBOLEXT:ph;MODIFIED:T;DELTAMASS:79.966331

grandrea commented 5 months ago

Both should work but you should not use "X" for any amino acid or "nterm" for protein N terminus, those go on separate lines.

lutzfischer commented 5 months ago

One note ahead: You can use any modification in other lines - but you have to define the modifications first. Xi parses the config file strictly linear - i.e. anything self-defined that you use somewhere has to be defined above of that. So modifications that you want to use as part of digestion or crosslinking rules need to be define above these.

@grandrea

by default crosslinker that crosslink to D will also crosslink to Dmod as far as I understand.

That is only true for label - as these are assumed to not change the relevant chemical properties. But modifications need to be mentioned in enzyme and crosslinker defintions. So if D and Dmod need to be crosslinkable or digestable, then both need to be mention in the specificities.

@cxdummies

The question is, in general, if I define a modified amino acid in the fasta sequence, do I have to add this specific modified amino acid to the settings of an enzyme, crosslinker and fixed/variable modifications?

Yes that would be the case.

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences?

Not as a labeling schema - the closest you can do is either define variable modifications in the fasta for each residue - but that will probably result in an exploding search space - or define the protein twice in the fasta file - with and without (fixed) modification. in both cases you should define the modified residue as known modification and add the right ones to the specificities of crosslinker and enzyme.

cxdummies commented 3 weeks ago

Hi Lutz,

Would settings like this work? Can the fixed and variable modifications recongnise the declared known modification?

modification:known::SYMBOLEXT:a;MODIFIED:C;DELTAMASS:1 modification:known::SYMBOLEXT:b;MODIFIED:M;DELTAMASS:2

modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065 modification:fixed::SYMBOL:Cacm;MODIFIED:Ca;MASS:161.03065

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:variable::SYMBOL:Mbox;MODIFIED:Mb;MASS:149.035395

cxdummies commented 3 weeks ago

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences?

Not as a labeling schema - the closest you can do is either define variable modifications in the fasta for each residue - but that will probably result in an exploding search space - or define the protein twice in the fasta file - with and without (fixed) modification. in both cases you should define the modified residue as known modification and add the right ones to the specificities of crosslinker and enzyme.

I tried defining the protein twice in the fasta file, declaring the modification:known and adding the right ones to the specificities of crosslinker and protease. Unfortunatly, XiSearch didn't identify the modified protein at all.

cxdummies commented 3 weeks ago

Applying the heavy label scheme works, although the scheme applies to every protein in the fasta file.

Does XiSearch recognise that the following masses should also become higher in the heavy labelled proteins?

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065

lutzfischer commented 3 weeks ago

Would settings like this work? Can the fixed and variable modifications recongnise the declared known modification?

modification:known::SYMBOLEXT:a;MODIFIED:C;DELTAMASS:1 modification:known::SYMBOLEXT:b;MODIFIED:M;DELTAMASS:2

modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065 modification:fixed::SYMBOL:Cacm;MODIFIED:Ca;MASS:161.03065

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:variable::SYMBOL:Mbox;MODIFIED:Mb;MASS:149.035395

yes but in that case you could define it a bit more compact as:

modification:known::SYMBOLEXT:a;MODIFIED:C;DELTAMASS:1 modification:known::SYMBOLEXT:b;MODIFIED:M;DELTAMASS:2

modification:fixed::SYMBOLEXT:cm;MODIFIED:C,Ca;DELTAMASS:57.021464
modification:variable::SYMBOLEXT:ox;MODIFIED:M,Mb;DELTAMASS:15.99491463

The resulting fixed modification would be Ccm and Cacm as well (symbolext is cumulative) and variable Mox and Mbox).

Can I define one specific protein that is fully 15N-labelled in the fasta file while keeping others as normal protein sequences? Not as a labeling schema - the closest you can do is either define variable modifications in the fasta for each residue - but that will probably result in an exploding search space - or define the protein twice in the fasta file - with and without (fixed) modification. in both cases you should define the modified residue as known modification and add the right ones to the specificities of crosslinker and enzyme.

I tried defining the protein twice in the fasta file, declaring the modification:known and adding the right ones to the specificities of crosslinker and protease. Unfortunatly, XiSearch didn't identify the modified protein at all.

Not sure why this should fail. Can you send me the config/Fasta (lutz dot fischer tu-berlin dot de)? Then I can have a look if I understand what went wrong here.

Applying the heavy label scheme works, although the scheme applies to every protein in the fasta file.

Does XiSearch recognise that the following masses should also become higher in the heavy labelled proteins?

modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065

It should create a labelled version of these as well. I.e. if the label schema is n15 you should see Ccmn15 as a modification.
BUT looking at the code I think there might be some problems there. Need to check what happens there - especially in connection with fasta defined modifications on top. Sorry label are somewhat untested at the moment and will have to see when I can test/fix that.

cxdummies commented 2 weeks ago

Applying the heavy label scheme works, although the scheme applies to every protein in the fasta file. Does XiSearch recognise that the following masses should also become higher in the heavy labelled proteins? modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395 modification:fixed::SYMBOL:Ccm;MODIFIED:C;MASS:160.03065

It should create a labelled version of these as well. I.e. if the label schema is n15 you should see Ccmn15 as a modification. BUT looking at the code I think there might be some problems there. Need to check what happens there - especially in connection with fasta defined modifications on top. Sorry label are somewhat untested at the moment and will have to see when I can test/fix that.

I found "Mox5" and "Ccm5" on the identified peptide sequences, but they were detected only on the unlabelled peptides, although they were expected to sit on the 15N-labelled peptides.

It would be really great if this could be further developed. 15N could be very useful in some applications.