Scout

Interactomics studies play a critical role in elucidating protein structures, functions, and interactions within complex cellular environments. Cross-linking mass spectrometry with cleavable cross-linking reagents (cXL-MS) has emerged as a powerful technique for large-scale interactomics analysis by identifying proximal amino acid pairs in protein samples. However, current computational cXL-MS tools face limitations in proteomic-scale studies, such as being too slow or generating excessive false positives, particularly at the protein-protein interactions level (PPIs).

Here, we present Scout, a computational methodology that enables interactomic analysis by identifying mass spectra of peptides linked with cleavable cross-linking reagents. By leveraging machine learning techniques, Scout ensures a controlled false discovery rate (FDR) at multiple levels, including cross-linked spectrum matches, residue pairs, and PPIs. Our methodology offers an efficient and accurate solution for large-scale interactomics studies, addressing the existing computational challenges.

Please cite our paper:
Clasen, MA, et al., “Proteome-scale recombinant standards and a robust high-speed search engine to advance cross-linking MS-based interactomics”, bioRxiv, 2023.

Equipment

Hardware

A computer with a minimum of 16 GB RAM and 4 computing cores is recommended. However, the software can take advantage of superior configurations.

Software

Windows 10 (64 bits) or later.
Python 3.10 or later.
The .NET Core 6 or later.
The Scout software, available for download at https://github.com/diogobor/Scout/releases

Data files

Scout v1.0 is compatible with data files in the formats MS2, Mascot Generic Format (MGF), Bruker® .d files, and Thermo® RAW files.
Scout saves results in its *.scout format, in the mzIdentML 1.2 and mzIdentML 1.3 proposed by HUPO Proteomics Standard Initiative to support the identification of cross-linked peptides. We note this is able to perform complete submissions of XL-MS data to PRIDE[1], and is therefore compatible with the PRIDE Inspector software[2]. Additionally, the software supports exporting all CSMs, Residue Pairs and PPIs as CSV files, as well as all results to XlinkCyNET[3] for visualization within Cytoscape[4].

Procedures

Software installation

1.1 Download Scout by clicking on Scout_setup_64bit.msi in the latest release.
1.2 Install it by double-clicking the previous downloaded file.
Workflow

The following workflow demonstrates how to perform a search using Scout.
2.1 Launch Scout: Open the Scout application to access its main window, as shown in Figure 1.

Figure 1: Graphical User Interface of Scout’s main window.

2.2 Initial Setup
2.2.1. Searching a single file: Select the ‘Raw File’ radio button and then select a tandem mass spectra file (e.g., MS2, MGF or Thermo® RAW), for searching a single file.
PS: For Bruker® .d files, select the folder that contains the name of the file.
2.2.2. Batch searching: Select the ‘Raw folder’ radio button and then specify a directory containing the tandem mass spectra files.

2.2.3. Fasta File: Select a file containing the protein sequences. The file format must be in FASTA format, typically obtained from Uniprot. For instance:

>protein name
PROTEINSEQUENCE

⇒ Click on 'Start' button to initiate the search by using the default parameters. Once the search is complete, the results window will be opened (see item 2.3).
⇒ To stop the search, click on 'Cancel' button and confirm.
PS: If for some reason the Scout closes, the search can continue from the point it was paused. To do this, just set the same parameters again and press the start button.
⇒ All procedures will be recorded in the Log box. To export it, go to File → Export log (or press ALT + M).

2.2.4 Search Parameters
Search parameters can be adjusted to optimize the search process. To modify the parameters, navigate to Parameters → Search (or press ALT + S), as illustrated in Figure 2a, a new window will open (Figure 2b).

Figure 2a: Search and Post Processing Parameters can be modified on Parameters menu.

Figure 2b: Search Parameters window

2.2.4.1. MS1 PPM Tolerance: Specify the ppm error tolerance for the precursor mass.
2.2.4.2. MS2 PPM Tolerance: Specify the ppm error tolerance for fragment ions.
2.2.4.3. Ion Pair PPM Tolerance: Specify the ppm error tolerance for ion pair mass.
2.2.4.4. Min. Peptide Length: Specify the minimum number of amino acids in each connected peptide.
2.2.4.5. Max. Peptide Length: Specify the maximum number of amino acids in each connected peptide.
2.2.4.6. Min. Peptide Mass: Specify the minimum peptide mass in Daltons.
2.2.4.7. Max. Peptide Mass: Specify the maximum peptide mass in Daltons.
2.2.4.8. Missed Cleavages: Specify the maximum missed cleavages allowed in a single peptide.
2.2.4.9. Max. Variable Mods: Specify the maximum number of variable post-translational modifications in a single peptide.

2.2.4.10. Enzyme: Select a proteolytic enzyme for in-silico digestion.

2.2.4.10.1. Add Enzyme: Navigate to the Enzymes tab and click on ‘Add Enzyme’ button (Figure 3a). A new window will be opened (Figure 3b).

Figure 3a: Enzymes window – This tab enables the addition or removal of enzymes.

Figure 3b: New Enzyme Inclusion – This window allows users to introduce a new enzyme to the existing list of enzymes.

2.2.4.10.1.1. Name: Specify a name for the new enzyme.
2.2.4.10.1.2. Sites: Specify the amino acids at which cleavage should occur. PS: The amino acids should be included without spaces, for instance, the trypsin sites should appear as KR.
2.2.4.10.1.3. Blocked by: Specify the amino acids that will impede the cleavage. PS: As in ‘Sites’, the amino acids must be typed without spaces.
2.2.4.10.1.4. C-Terminal: Check this option if the new enzyme cleaves at the C-terminus of the peptide; otherwise, cleavage will occur at the N-terminus.
Click on the ‘Confirm’ button to incorporate the new enzyme into the Enzymes table. Afterwards, return to 2.2.4.10.
2.2.4.10.2 To remove an Enzyme, press ‘Del’ key. A confirmation message will be displayed. Confirm it to proceed.

2.2.4.11. Enzyme specificity: select an enzyme specificity from the list: full specific or semi-specific.
2.2.4.12. Cleavable Reagent: select a cleavable cross-linker from the list.
2.2.4.12.1. Add Reagent: go to XL Reagents tab and click on ‘Add Reagent’ button (Figure 4a). A new window will be opened (Figure 4b).

Figure 4a: Chemical cross-linkers window: on this tab, new reagents can be added or removed.

Figure 4b: A new reagent can be added into the list of cross-linkers.

2.2.4.12.1.1. Name: Specify a unique identifier for the new cleavable reagent.
2.2.4.12.1.2. Light Fragment Mass: Specify the light fragment mass in Daltons.
2.2.4.12.1.3. Heavy Fragment Mass: Specify the heavy fragment mass in Daltons.
2.2.4.12.1.4. Full Mass: Specify the full mass of the reagent in Daltons.
2.2.4.12.1.5. Ion Pair Shift: The pair will be automatically calculated according to the light and heavy fragment masses.
2.2.4.12.1.6. Target Residues: Specify the target residues that the new cleavable cross-linker will react with. PS: List residues without spaces; for example, use KSYT for DSSO.
2.2.4.12.1.7. N-Terminal: Check this option if the new cleavable cross-linker also reacts at the N-terminus of the protein.
Click the ‘Confirm’ button to incorporate the new cleavable reagent into the XL Reagents table. Subsequently, return to 2.2.4.10.
2.2.4.12.2 To remove an XL Reagent, press ‘Del’ key. A confirmation message will be displayed. Confirm it to proceed.

2.2.4.13. Deconvolute for Ion Pair Searching: Check this option to deconvolute the spectra before searching the ion pairs. If enabled, the deconvolution will be performed by YADA 3.0 [5].
2.2.4.14. Deconvolute for Scoring: Check this option to deconvolute spectra prior to searching for CSMs. If enabled, the deconvolution will be performed by YADA 3.0. [5]

Explanation on ‘deconvolution’: As cleavable cross-linking search relies heavily on locating ion pairs, noisy spectra can be very harmful to the overall quality of end results. As such, spectra deconvolution is generally recommended, and set as default for the ion pair searching step of Scout. In mass spectrometry, deconvolution refers to the process of de-charging and/or deisotoping a spectrum. In practical terms, this is the process of iterating the MS2 searching looking for charge envelopes and isotopic envelopes, grouping them all into a single ion at charge +1. This is particularly important for the first step of Scout’s workflow, Ion Pair Doublet Searching, as we found that being too lenient with the search for ion pairs may lead to false positives.

2.2.4.15. Add Modification: Click on this button to add a new post-translational modification (Figure 5a). A new window will appear (Figure 5b).

Figure 5a: Modification Window – This tab displays all variable and static modifications.

Figure 5b: New Modification Inclusion – This window enables the addition of a new modification into the modifications list.

2.2.4.15.1. Name: Specify a unique name for the new post-translational modification.
2.2.4.15.2. Mass Shift: Specify the mass shift in Daltons.
2.2.4.15.3. Target Residues: Specify the target residues for this new post-translational modification. Use capital letters without spaces.
2.2.4.15.4. C-Terminal: Check this option if the new post-translational modification occurs at the C-terminus of the peptide.
2.2.4.15.5. N-Terminal: Check this option if the new post-translational modification occurs at the N-terminus of the peptide.
2.2.4.15.6. Variable: Check this option if the new post-translational modification is dynamic, i.e., if it may or may not occur.
Click on the ‘Confirm’ button to incorporate the new modification into the Modification table.
PS: Upon completing this process, ensure the new post-translational modification is checked in ‘Use’ field for it to be considered in the search.
2.2.4.15.7 To remove a modification, press the ‘Del’ key. A confirmation message will be displayed. Confirm the action to proceed.

2.2.4.16. Contaminants: on this tab, the current contaminants can be modified as well as new ones added (Figure 6). PS: All contaminants must be entered in FASTA format (similar to item 2.2.3).

Figure 6: Contaminants tab: all contaminant sequences can be modified as well as new ones can be added.

2.2.4.17. Export: Choose this option to save the current parameters to a file.

2.2.4.18. Load: Select this option to import parameters from a file.

2.2.4.19. As default: Set the current parameters as the software’s default settings.

2.2.4.20. Restore: Revert to factory default parameters.

2.2.4.21. Advanced: Click on this link to customize the advanced parameters (not necessary for most searches). (Figure 7).

Figure 7: Edit advanced parameters: In this window, all search parameters can be modified.

2.2.4.22. Advanced Search Parameters

2.2.4.22.1. Spectra saved in the results: Check this option to save the identified experimental spectra in the results file.

2.2.4.22.2. Add contaminants: Check this option to consider common mass spectrometry contaminants during the search.
2.2.4.22.3. Add decoys: Check this option to add decoys before initiating the search. Note: for the FDR calculation, this option should be checked.
2.2.4.22.4. FASTA batch size: Specify the maximum number of protein sequences to be loaded into memory at a given time.
2.2.4.22.5. Fragment bin tolerance: Specify the bin size for binning mass spectra and for theoretical mass spectra generation.
2.2.4.22.6. Fragment bin offset: Specify offset in Daltons to be considered to initiate the binning process.
2.2.4.22.7. Minimum fragment bin m/z: Specify the minimum m/z to be vectorized.
2.2.4.22.8. Maximum fragment bin m/z: Specify the maximum m/z to be vectorized.

Explanation on ‘Binning’: We refer to binning mass spectra into vectors as the process of discretization of continuous m/z values by partitioning them into predefined bins. The process consists of establishing an offset (in Da) and a bin width (in Da) to define the initial point and bin size, respectively. Each bin encompasses a specific m/z range, and peaks are allocated to the corresponding bin based on their m/z value. Subsequently, the intensity values of peaks within each bin are aggregated, in our case, by summation. The output entails a vector of intensity values, with each entry representing a distinct bin. This vectorial representation streamlines mass spectral data manipulation and comparison, facilitating bioinformatics analyses. Therefore, the binning loosely refers to the MS/MS tolerance.

2.2.4.22.9. No. Isotopic Possibilities: The precursor mass stored in raw data files may not correspond to the monoisotopic peak. This option allows the software to find the correct monoisotopic peak, which is required to identify the molecule but at the cost of opening up the search space. If a high number of isotopic possibilities is set, the search space will increase accordingly and impact Scout’s sensitivity negatively.
2.2.4.22.10. Metabolic labelling Search: check this option to perform SILAC search.
2.2.4.22.10.1 Add Group: a new window will open to add the groups for labelling peptides, e.g., heavy and light groups as well as their amino acids can be added in this feature.
2.2.4.22.10.2 Hybrid mode: check this option to find not only heavy-heavy / light-light peptides, but also heavy-light/light-heavy ones.
2.2.4.22.11. Isobaric labelling search: check this option to perform Isobaric labelling search (e.g., TMT, iTRAQ).
2.2.4.22.11.1 Add Reagent: a new window will open to set the reagent up.
2.2.4.22.11.1.1 Reagent: Select a reagent. If the desired reagent is not in the list, click on the 'Add' button.
2.2.4.22.11.1.2 Free residue tolerance: Set the minimum number of residues that TMT will not react.
2.2.4.22.12. Export: See 2.2.4.17.
2.2.4.22.13. Load: See 2.2.4.18.
2.2.4.22.14. As default: See 2.2.4.19.
2.2.4.22.15. Restore: See 2.2.4.20.

2.2.5 Post Processing Parameters

Adjusting certain post processing parameters may improve the performance of the process. To do this, navigate to Parameters → Post Processing (or use the keyboard shortcut ALT + P), as can be illustrated in Figure 2a. A new window will appear (Figure 8).

Figure 8: Post Processing Parameters window

2.2.5.1. Use only unique XLs into PPIs: Check this option to remove PPIs that contain shared cross-linked peptides.
2.2.5.2. Separate protein intra- and inter-crosslinks: Check this option to apply FDR control separately to intra- and inter-crosslinks at the CSM, Residue Pair, and PPI levels.
2.2.5.3. Group PPIs by gene: Check this option to group all protein-protein interactions by gene name.
2.2.5.4. FDR on CSM level: Specify the FDR on CSM level.
2.2.5.5. FDR on Residue Pair level: Specify the FDR on Residue Pair level.
2.2.5.6. FDR on PPI level: Specify the FDR on PPI level.
2.2.5.7. Export: Similar to 2.2.4.17.
2.2.5.8. Load: Similar to 2.2.4.18.
2.2.5.9. As default: Similar to 2.2.4.19.
2.2.5.10. Restore: Similar to 2.2.4.20.

2.3. Results

Upon completion of the search processing, the results are automatically saved in the same directory in which the RAW files are (^*.scout file) and presented in a new window with separate tabs: CSMs, Residue Pairs and PPIs, as well as the parameters used in the search. (Figure 9)

Figure 9: Results window.

Double-clicking on a row containing a CSM result opens the spectrum viewer displaying the spectrum from which it was identified. The sequence coverage (Figure 10a) and the standard deviation plot (m/z vs ppm) of all identified peaks (Figure 10b) can be visualized through this window as well as all fragment ions (Figure 10c). Double-clicking a cross-link opens a list of CSMs from which it is derived (Figure 10d). Double-clicking a PPI displays all cross-links belonging to the PPI (Figure 10e) and an additional click reveals all CSMs associated with the respective cross-link.
PS: The spectrum viewer will be opened if the RAW file is in the directory or the mass spectrum was saved in ^*.scout file (see item 2.3.8).

Figure 10a: Sequence coverage annotation & spectrum visualization.

Figure 10b: Standard deviation plot of all identified peaks.

Figure 10c: Theoretical fragment ions.

Figure 10d: List of CSMs from a specific Residue Pair.

Figure 10e: List of Residue Pairs from a specific PPI.

2.3.1 Filter results: results contain FDR filtered identifications on all levels – in the graphical user interface, personal filters can be applied:
2.3.1.1. CSM level: In this tab (Figure 9), the CSMs are displayed according to the specified filter parameters.
2.3.1.1.1 Scan: Specify the scan number to be displayed.
2.3.1.1.2 Score: Specify the score cutoff. All CSMs with a score greater than ‘Score’ will be displayed.
2.3.1.1.3 Search: Type the α or/and β peptide (separated by '-') as well as the protein 1 or/and protein 2 (separated by '-'), or even gene 1 or/and gene 2 (separated by '-') to be displayed. PS: Type at least four characters.
2.3.1.1.4 Files: Select the file(s) that the results to be displayed belong to. If no files or ‘All files’ is selected, all results will be displayed.
2.3.1.1.5 Show inter-protein links only: Check this option to display only the CSMs that belong to inter-protein interactions.
2.3.1.1.6 Show decoys: Check this option to display decoy identifications.
2.3.1.1.7 Click on ‘Filter’ button or press Enter to perform the filter.
2.3.1.1.8 Click on ‘Reset’ button to restore default result display.
2.3.1.1.9. Summary: In this box, the number of identified CSMs will be displayed as well as the calculated FDR.

2.3.1.2. Residue Pair level: on this tab, the residue pairs will be displayed according to the specified filter parameters (Figure 11).

Figure 11: Residue Pairs tab

2.3.1.2.1 Score: Specify the score cutoff. All Residue Pairs with a score greater than ‘Score’ will be displayed.
2.3.1.2.2 Search: Type the α or/and β (separated by '-') peptide as well as the protein 1 or/and protein 2 (separated by '-') or even gene 1 or/and gene 2 (separated by '-') to be displayed. PS: Type at least four characters.
2.3.1.2.3 Show inter-protein links only: Check this option to display only the Residue Pairs that belong to inter-protein interactions.
2.3.1.2.4 Show decoys: Check this option to display decoy identifications.
2.3.1.2.5 Click on ‘Filter’ button or press Enter to perform the filter.
2.3.1.2.6 Click on ‘Reset’ button to restore the results.
2.3.1.2.7. Summary: In this box, the number of identified Residue Pairs will be displayed as well as the calculated FDR.

2.3.1.3. PPI level: on this tab, the PPIs will be displayed according to the specified filter parameters (Figure 12)

Figure 12: PPIs tab

2.3.1.3.1 Score: Specify the score cutoff. All PPIs with a score greater than ‘Score’ will be displayed.
2.3.1.3.2 Search: Type protein 1 or/and protein 2 (separated by '-') as well as gene 1 or/and gene 2 (separated by '-') to be displayed. PS: Type at least four characters.
2.3.1.3.3 Show inter-protein links only: Check this option to display only the identifications that belong to inter-protein interactions.
2.3.1.3.4 Show decoys: Check this option to display decoy identifications.
2.3.1.3.5 Group PPIs by gene: Check this option to group all protein-protein interactions by gene name.
2.3.1.3.6 Click on ‘Filter’ button or press Enter to filter the results.
2.3.1.3.7 Click on ‘Reset’ button to restore the results.
2.3.1.3.8. Summary: in this box, the number of identified PPIs will be displayed as well as the calculated FDR.

2.3.2 Parameters: both search and post processing parameters used in the search can be visualized on this tab. (Figure 13a and b)

Figure 13: Search and post processing parameters can be visualized on this tab (Figure 13a and 13b, respectively).

2.3.2.1 Post processing parameters: the parameters used to perform FDR on CSM, Residue Pair and PPI levels can be modified to improve the results. To do so, click on 'Edit' button and change the parameters (Figure 13 b) (Similar to 2.2.5). Afterwards, a new filter will be performed.

2.3.3 Open Results: new Scout results can be opened (*.scout file). To do so, go to File → Open Results (or press CTRL + O), as can be seen in Figure 14a. PS: Multiple files can be opened if all of them used the same parameters in the search.
⇒ Results can also be opened from the Scout starting page by clicking on File menu → Open Results (or pressing CTRL + O).

2.3.4 Save Results: the current results can be saved to preserve them. To do so, go to File → Save → Results (or press CTRL + S), as can be seen in Figure 14a.

Figure 14a: Open and Save results as well as the parameters used in the search.

2.3.4.1 Save Results as mzIdentML file: the current results can also be saved in mzIdentML 1.2 or mzIdentML 1.3 format. To do so, after going to ‘Save Results’, a new window will open ('Save as'), then change ‘Save as type’ to mzIdentML 1.2 (or 1.3) Result File (.mzid), as can be seen in Figure 14b. Type a file name and click on 'Save' (or press enter).
⇒ PS: Besides the mzIdentML file, a *-specID.ms2 file will also be saved, which holds all the identified MS/MS spectra. Both files are required to proceed with the 'Complete Submission' in the PRIDE[1] system.

Figure 14b: Save the results in mzIdentML 1.2 or mzIdentML 1.3 format.

2.3.5 Save Parameters: the search and post processing parameters used in the search can be exported. To do so, go to File → Save → Parameters (or press ALT + W), as can be seen in Figure 15.

2.3.6 Report: Scout allows to export displayed reports, such as CSMs (filtered results), Residue Pairs, PPIs and unfiltered CSMs as well as the import file used on XlinkCyNET to visualize the protein-protein interaction network. (Figure 15)

Figure 15: Export reports as well as the input file used on XlinkCyNET.

2.3.7 Reprocess FDR: the results can be filtered again by using the current post-processing parameters (that can be modified, see item 2.3.2). To do so, go to Tools → Reprocess FDR (or press ALT + F).(Figure 16)

Figure 16: Reprocess FDR, Import spectra and Statistical analysis features accessed by Tools menu.

2.3.8 Import Spectra: if the option ‘Spectra saved in the results’ is unchecked (see item 2.2.4.22.1), the identified spectra will not be displayed if the RAW file is not present in the same directory of the results. To import the identified spectra, go to Tools → Import Spectra (or press CTRL+ I) and specify where the RAW files are. (Figure 16)

2.3.9 Statistics: the user can obtain some statistical analysis from the results, such as, the precursor charge distribution (Figure 17a) as well as reaction sites distribution (Figure 17b) based on the identified cross-links. To do so, go to Tools → Statistical analysis (or press CTRL + Y). (Figure 16)

Figure 17a: Precursor charge distribution of the identified cross-links.

Figure 17b: Reaction sites distribution taking into account all identified cross-links.

2.4. Filter from the Scout starting page

The results can be filtered again with a different FDR from the one that was used for the first round by I) switching to the tab ‘Filter’; II) selecting the folder that contains the identification files (*.buf); III) specifying the FASTA file; IV) modifying the post-processing parameters (see item 2.2.5); and clicking on ‘Filter’ button (Figure 18). When the filter is finished, a result window opens (see item 2.3).

⇒ To stop the filter, click on 'Cancel' button and confirm.

Figure 18: Filter tab window

2.5. Check for updates

Scout checks for updates on software startup. Additionally, on Help → Check for updates, the user can visualize all releases (and their notes) as well as whether Scout is updated. If the current Scout version is not up-to-date, users will have the option to update within this window. (Figure 19)

Figure 19: Check for updates window.

Closing remarks

In conclusion, Scout is a powerful tool for identifying protein-protein interactions using cleavable cross-linkers in proteomic datasets. Its user-friendly interface, customizable search and post-processing parameters, and multiple filtering options make it a versatile tool for protein interaction analysis. Scout can be particularly useful for studying complex biological systems when identifying protein-protein interactions is crucial for understanding their function. Overall, Scout provides a valuable resource for researchers interested in studying protein-protein interactions at a large scale.

References

[1] J. A. Vizcaíno et al., “The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013,” Nucleic Acids Res., vol. 41, no. Database issue, pp. D1063-1069, Jan. 2013, doi: 10.1093/nar/gks1262.

[2] Y. Perez-Riverol et al., “PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets,” Mol. Cell Proteomics, vol. 15, no. 1, pp. 305–317, Jan. 2016, doi: 10.1074/mcp.O115.050229.

[3] D. B. Lima, Y. Zhu, and F. Liu, “XlinkCyNET: A Cytoscape Application for Visualization of Protein Interaction Networks Based on Cross-Linking Mass Spectrometry Identifications,” J. Proteome Res., vol. 20, no. 4, pp. 1943–1950, Apr. 2021, doi: 10.1021/acs.jproteome.0c00957.

[4] P. Shannon et al., “Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks,” Genome Res., vol. 13, no. 11, pp. 2498–2504, Nov. 2003, doi: 10.1101/gr.1239303.

[5] M. A. Clasen et al., “Increasing confidence in proteomic spectral deconvolution through mass defect,” Bioinformatics, vol. 38, no. 22, pp. 5119–5120, Nov. 2022, doi: 10.1093/bioinformatics/btac638.

diogobor / Scout

readme