HUPO-PSI / mzSpecLib

mzSpecLib: A standard format to exchange/distribute spectral libraries
https://hupo-psi.github.io/mzSpecLib/
Apache License 2.0
24 stars 14 forks source link

Naming replicate spectra in consensus libraries #51

Closed mobiusklein closed 1 year ago

mobiusklein commented 1 year ago

This question has to do with how to mark up a consensus spectrum to link it back to the replicating spectra in their raw files when those raw files aren't on ProteomeExchange.

Files

Spectrum Library: https://chemdata.nist.gov/download/peptide_library/libraries/skin_hair/IARPA3_best_tissue_add_info.msp.zip

Metadata File: https://chemdata.nist.gov/download/peptide_library/libraries/skin_hair/IARPA3_all.out.zip

Metadata is a sparse table mapping consensus spectra to their contributing replicates:

Peptide Charge  Modification    Scans   Raw file    Folder  Tissue
AAAIAYGLDK  2   0   73415   "am_03_rg_t100_nlumos_2021-02-19_350-1600_100_nm_hcd30_360min_tryp_pos.raw" "hair_rg_guan"  Hair
AAAPGPCPPPPPPP  2   1(6,C,CAM)  33291;33463 "hf1_18_rg_l1_2021-08-06_380-2000_120_hcd30_255min_sp3_lysctryp_i_pos.raw"  "6donors_sp3"   Hair
            30890;32043 "hf3_17_rg_l1_2021-08-06_380-2000_120_hcd30_255min_sp3_lysctryp_i_pos.raw"  "6donors_sp3"   Hair
AAAQWVR 2   0   41910;42017 "xxx_2019_0215_rj_74_strapskin.raw" "method_development"    Skin
            7991    "20190429_009_llnl_pr_014_dda_2ug.raw"  "osu_dda"   Skin
            9486    "20190503_037_llnl_pr_29_dda_2ug.raw"   "osu_dda"   Skin
            10724   "20190508_002_llnl_pr_08_dda_2ug.raw"   "osu_dda"   Skin
            9504    "20190508_008_llnl_pr_24_dda_2ug.raw"   "osu_dda"   Skin
            10287   "20190508_044_llnl_pr_06_dda_2ug.raw"   "osu_dda"   Skin
            9556    "20190508_047_llnl_pr_13_dda_2ug.raw"   "osu_dda"   Skin
            10231   "20190508_050_llnl_pr_16_dda_2ug.raw"   "osu_dda"   Skin
            10186   "20190508_058_llnl_pr_21_dda_2ug.raw"   "osu_dda"   Skin
            9342    "20190508_061_llnl_pr_22_dda_2ug.raw"   "osu_dda"   Skin
            9280    "20190508_064_llnl_pr_23_dda_2ug.raw"   "osu_dda"   Skin
            10586   "20190508_067_llnl_pr_015_dda_2ug.raw"  "osu_dda"   Skin

Mapping AAAQWVR/2 to many, many spectra across multiple raw files. The appropriate way to express this (so far as I can tell) is to use either contributing replicate spectrum keys or contributing replicate spectrum USI. The second option makes sense since those contributing spectra aren't in the library itself. However, this project didn't publish its data on ProteomeExchange so I cannot construct a "real" USI for it.

Options

Fake USI

This looks "okay" and preserves the available information, but feels wrong because it sets up an expectation that this URI resolves to something. If there were a way to canonically express that this is a "local" or "private" dataset in the accession field, that would make this less misleading.

<Spectrum=...>
...
MS:1003065|spectrum aggregation type=MS:1003067|consensus spectrum
MS:1003299|contributing replicate spectrum USI=mzspec:_IARPA3:20190508_067_llnl_pr_015_dda_2ug:scan:10586
MS:1003299|contributing replicate spectrum USI=mzspec:_IARPA3:20190503_037_llnl_pr_29_dda_2ug:9486
...

Attribute Groups

This looks better as written, though it isn't preferable since A) it's more verbose, B) is universally not portable, and C) it conflicts with the usage of scan number and constituent spectrum file as used for individual spectra where they are not grouped.

<Spectrum=...>
...
MS:1003065|spectrum aggregation type=MS:1003067|consensus spectrum
[1]MS:1003203|constituent spectrum file=20190508_067_llnl_pr_015_dda_2ug.raw
[1]MS:1003057|scan number=10586
[2]MS:1003203|constituent spectrum file=20190503_037_llnl_pr_29_dda_2ug.raw
[2]MS:1003057|scan number=9486
...
edeutsch commented 1 year ago

Let's use a USI with USI000000 as suggested in the USI spec (https://github.com/HUPO-PSI/usi/blob/master/CollectionIdentifiers.md) (from call on 2023-02-24)