Rappsilber-Laboratory / XiSearch

XiSearch
Apache License 2.0
9 stars 7 forks source link

Add DSPP/PhoX configuration #68

Closed anfoss closed 1 year ago

anfoss commented 1 year ago

I want to add a new cross linker (PhoX). This is what I added to the conf so far but I am not sure it is correct/the best way to do it.


crosslinker:SymetricSingleAminoAcidRestrictedCrossLinker:Name:PhoX;MASS:209.97181;LINKEDAMINOACIDS:K(0),S(0.2),T(0.2),Y(0.2),nterm(0)

modification:linear::SYMBOLEXT:phoxOH;MODIFIED:K,S,T,Y;DELTAMASS:227.98237
modification:linear::SYMBOLEXT:phoxNh;MODIFIED:K,S,T,Y;DELTAMASS:226.99836

However, I get extremely low nr of cross linking compared to other softwares for exactly the same sample. Is there any other parameter which should be added/modified?

lutzfischer commented 1 year ago

The crosslinker definition looks alright to me - so guess it might have something to do with other parameters. Can you post the complete config?

anfoss commented 1 year ago

Here is the full config

####################
##Tolerances
## for matching the precursor information
tolerance:precursor:10ppm
## for matching fragments to the ms2-peaks
tolerance:fragment:25ppm

#####################
## how many cpus to use
## values smaller 0 mean that all avaiblable but the mentioned number will be used
## e.g. if the computer has 4 cores and UseCPUs is set to -1 then 3 threads are used for search.
## this is a bit relativated by the buffering, as buffers also use threads to decouple the input and output of the buffer.
## each thread will also have a small buffer between itself and the input and the output queue - but the overal cpu-usage of these should be smallish
UseCPUs:-1

##==============================
## Homobifunctional
## format :
## crosslinker:SymetricSingleAminoAcidRestrictedCrossLinker:Name:[name];MASS:[cross-linker mass];LINKEDAMINOACIDS:[list of possible cross-link targts];MODIFICATIONS:[list of associated modifications];decoy
## with:
##  Name:             A name of the cross-linker
##  MASS:             The mass of the cross-linker as  the difference between the mass of the two peptides and the mass of the mass of the two peptides when reacted with the cross-linker
##  LINKEDAMINOACIDS: A comma separated list of amino-acids that the cross-linker can react with. Additionaly nterm and cterm are accepted
##                    Also amino-acids can get a ranking by defining a penelty (between 0 and 1) for them.
##                    E.g. K(0),S(0.2),T(0.2),Y(0.2),nterm(0) means that K and the protein n-terminal are more likely to be cross-linked then S, T, or Y
## MODIFICATIONS:     a comma-separeted list defining related modifications 
##                    E.g. NH3,17.026549105,OH2,18.0105647                  
##                    defines two modifications:
##                      NH3: that adds 17.026549105 to the mass of the cross-linker
##                      OH2: that adds 18.0105647 to the mass of the cross-linker
## LINAERMODIFICATIONS: same as MODIFICATIONS but will only be applied to linear peptides
## LOSSES:            a comma-separeted list defining crosslinker related losses
##                    E.g. X,10,Y120
##                    defines two losses:
##                      X: a loss of 10Da from the cross-linker
##                      Y: a loss of 120Da from the cross-linker
## STUBS:            a comma-separeted list defining crosslinker stubs for MS-cleavable cross-linker 
##                    E.g. A,54.0105647,S,103.9932001,T,85.9826354
##                    defines three cross-linker stubs:
##                      A: with mass 54.0105647
##                      S: with mass 103.9932001
##                      T: with mass 85.9826354
##
## additionally one can also define heterobifunctional cross-linker
## crosslinker:AsymetricSingleAminoAcidRestrictedCrossLinker:Name:[name];MASS:[cross-linker mass];FIRSTLINKEDAMINOACIDS:[list of possible cross-link targts];SECONDLINKEDAMINOACIDS:[list of possible cross-link targts]
## syntax is similare to homobifinctional crosslinker with two changes:
## two sets of specificities FIRSTLINKEDAMINOACIDS and SECONDLINKEDAMINOACIDS and modifications cant be defined together with the cross-linker
##
##BS3
#crosslinker:SymetricSingleAminoAcidRestrictedCrossLinker:Name:BS3;MASS:138.06807;LINKEDAMINOACIDS:K(0),S(0.2),T(0.2),Y(0.2),nterm(0)
#crosslinker:SymetricSingleAminoAcidRestrictedCrossLinker:Name:BS3;MASS:138.06807;LINKEDAMINOACIDS:K,nterm;MODIFICATIONS:NH2,17.026549105,OH,18.0105647
## DSPP
crosslinker:SymetricSingleAminoAcidRestrictedCrossLinker:Name:PhoX;MASS:209.97181;LINKEDAMINOACIDS:K(0),,nterm(0)

##BS2G
#crosslinker:SymetricSingleAminoAcidRestrictedCrossLinker:Name:BS2G;MASS:96.02112055;LINKEDAMINOACIDS:K,S,T,Y,nterm;MODIFICATIONS:NH2,17.026549105,OH,18.0105647,LOOP,0
##==============================
## asymetric cross-linker (currently modifications need to be specified separately)
##SDA
# crosslinker:AsymetricSingleAminoAcidRestrictedCrossLinker:Name:SDA;MASS:82.0413162600906;FIRSTLINKEDAMINOACIDS:*;SECONDLINKEDAMINOACIDS:K,S,Y,T,nterm

###################
##Modifications
## modifications are possible to be defined as three types:
## fixed: every aminoacid is modified
## variable: peptides containing the aminoacids will be searched with and without modification
## known: not automatically searched - but enables to defined modification as part of the FASTA-file as fixed or variable modification (e.g. defined histon modification 
##         only on histones without haveing to search them everywhere).
## linear: peptides with that modification will only be searched as linear peptides (not part of an cross-link)
## 
## Format is: 
##        modification:(fixed|variable|known)::SYMBOL:(how is the modification represented);MODIFIED:[aminoaid];MASS:[mass of the modified amino acid]
##  Symbol:      peptides will be reported with the modification as part of the 
##               sequence the symbol to represent the modified amino acid
##               Ccm for Carboxyamidomethylation of Cysteine
##  MODIFIED:    the amni-acid to be modified (e.g. C)
##  MASS:        the total mass of the modified amino acid
##   (This format is also used to define amino-acid substitution)
##
## Alternativly modifications that apply to several aminoacids can also be defined as
##         modification:variable::SYMBOLEXT:[extension];MODIFIED:[amino-acids];DELTAMASS:[mass-difference]
##  SYMBOLEXT:   What will be appended to the amino-acid to denote this modification (E.g. ox for oxidation)
##  MODIFIED:    A list of aminoa cids that can have this modification
##  DELTAMASS:   the mass diference between the modified and teh undmodified version of the amino-acid.
##
##========================
##--Fixed Modifications
modification:fixed::SYMBOLEXT:cm;MODIFIED:C;DELTAMASS:57.021464

##========================
##--Variable Modifications
##Mox = 131.040485 + 15.99491
modification:variable::SYMBOLEXT:ox;MODIFIED:M;DELTAMASS:15.99491463
# modification:variable::SYMBOLEXT:ox;MODIFIED:M,Q,N;DELTAMASS:15.99491463
# modification:variable::SYMBOL:Mox;MODIFIED:M;MASS:147.035395

##========================
##--linear Modifications
## peptides containing the given amino-acids will be searched with and without 
## the modification - BUT the modified version will not be searched as part of a 
## cross-link
modification:linear::SYMBOLEXT:phoxOH;MODIFIED:K,;DELTAMASS:227.98237
modification:linear::SYMBOLEXT:phoxNh;MODIFIED:K,;DELTAMASS:226.99836

###################
## Digest
##Tryptic digest
#digestion:PostAAConstrainedDigestion:DIGESTED:K,R;ConstrainingAminoAcids:P;NAME=Trypsin
digestion:PostAAConstrainedDigestion:DIGESTED:K,R;ConstrainingAminoAcids:;NAME=Trypsin\P
#digestion:PostAAConstrainedDigestion:DIGESTED:K;ConstrainingAminoAcids:P;NAME=LysC
##No Digestion e.g. for Synthetic Peptide
#digestion:NoDigestion:

#####################################################################################################
##Fragment match settings

####################
## Non-Lossy Fragments to consider
fragment:BIon
fragment:YIon
#fragment:CIon
#fragment:ZIon
#fragment:AIon
#fragment:XIon
## peptide ion should always be enabled, as otherwise no standard cross-linked fragments will be matched - also needed for precoursor-fragment matches
fragment:PeptideIon
## enables double fragmentation with in one fragment but also fragmentation events on both peptides
#fragment:BLikeDoubleFragmentation;ID:4

###################
##Losses
## Water
loss:AminoAcidRestrictedLoss:NAME:H20;aminoacids:S,T,D,E;MASS:18.01056027;cterm
## Amonia
loss:AminoAcidRestrictedLoss:NAME:NH3;aminoacids:R,K,N,Q;MASS:17.02654493;nterm
## CH3SOH from Mox
#loss:AminoAcidRestrictedLoss:NAME:CH3SOH;aminoacids:Mox;MASS:63.99828547
##AIons as loss from BIons
## when defiend as loss the matched fragments will have less impact on the score then matching A-Ions
loss:AIonLoss
##crosslinker modified fragment (fragmentation of the cross-linker petide bound)
#loss:CrosslinkerModified

####################
## include linear matches
EVALUATELINEARS:true

#####################
## Generally lossy fragmenst will have a smaller impact on subscores then non-lossy versions of a fragment.
## But some subscores (anything called conservative) considere a fragment observed even if n neutral losses for that fragment where observed but not the fragment itself 
## this defines how many loses are needed to make a fragment count as observed
ConservativeLosses:3

####################
# if this is set to true also fragment matches are reported that are of by 1 dalton
MATCH_MISSING_MONOISOTOPIC:true

####################
## how many peaks to consider for alpha-peptide-search
mgcpeaks:10

###################
### Candidate selection
## Scoreing happens in three stages
## alpha candidates are selected and scored
## top n aplpha candidates are taken and all matching beta-candidates will be selected and prescored
## the top X of these are then fully matched and scored
## how many "alpha" peptide candidiates will be considered for finding beta candidates
topmgchits:150
## how many combinations of alpha and beta peptides will be considered for final scoring
topmgxhits:10

##################
## how many misscleavages are considered
missedcleavages:4

####################
## define a minimum peptide length (default 2)
#MINIMUM_PEPTIDE_LENGTH:6

#####################
## IO-settings - for improving the parallel processing it's better to do some buffering 
## this reduces the time threads potentially have to wait for spectra to be searched (BufferInput)
## or to be written out (BufferOutput).
BufferInput:100
BufferOutput:100

#####################
## Only write out the top match per spectrum
TOPMATCHESONLY:true

#####################
## maximum mass of a peptide to be considered for fragmentation
#MAXPEPTIDEMASS:5000

#####################
## some limits for generating modified peptides
MAX_MODIFICATION_PER_PEPTIDE:3
MAX_MODIFIED_PEPTIDES_PER_PEPTIDE:20

####################
##If the top-match for a spectra has a score lower than this, the spectra and all of its matches are not reported
#MINIMUM_TOP_SCORE:0

#########################################
## what fragment tree to use
## default: the default tree
## FU: uses a fastutil based implementation of the fragmenttree and conservea lot of memory doing so.
## searching a few hunderd proteins is then possible with just 8GB
FRAGMENTTREE:FU

#########################################
## we need the run name and scan number for a spectrum
## but mgf-files have that info (if at all) in the TITLE line
## and it is not exactly defined how that is stored
## some mgf-files that we have encountered are already regognised for others
## the following to regular expressions can be defined to read out scan number and run
## if both are supplied these will be first tried before the internal automatic will be used
## the scan number and the raw file need to be in the first capturing group
## Example:
## the mgf contains headers like:
## TITLE= Elution from: -1.0 to -1.0 period:  experiment:  cycles:  precIntensity: -1.0 RawFile: myrawfile FinneganScanNumber: 14846
## then the regular expressions should be defined as
## SCAN_RE: .*FinneganScanNumber:\s*([0-9]*)\s*
## RUN_RE: .*RawFile:\s*(.*)FinneganScanNumber:
##
## XiSearch comes with a range of know regulare mgf-title formats but there are a lot 
## more formats out there. So if you encounter an error that the file is not known try this.
## 
#SCAN_RE:
#RUN_RE:

#########################################
## for a fragment up to how many neutral losses for that fragment are considered
#MAXTOTALLOSSES:

#########################################
## for each type of loss up to how often is that considered for a single fragment
#MAXLOSSES:

#########################################
## if a spectrum comes with a list of predefined masses for the peptides
## take these as the exclusively accepted masses or just give them priorety? 
#XL_Peptide_Mass_Candidates_Exclusive:false

########################################
## the normalized score by default ignores subscores that are not defined
## with this setting the miossing score can be replaced by a different score
## this replacement would be replacement for the normalized score and the most 
## sensible replacement I coudl see would be "1".
##
#normalizerml_defaultsubscorevalue:1

########################################
##InputFIlter that modify the spectra before search
##
##DENOISE
## very beta - don't use
## denoise the spectra prior search (default top 20 peaks per 100 Da are kept
#FILTER:denoise:peaks:15;window:100

########################################
## consider also matches to a precursor mass that
## are up to n Da (actually n*1.00335Da) lighter
## this would account for missing isotope peaks in the MS1
missing_isotope_peaks:2
lutzfischer commented 1 year ago

The only thing I see is, that the tolerances are a bit high. Do you need 10 and 25ppm? I have never looked at phox results personally - so can't give you any more details as such.