SomaLogic / SomaDataIO

The SomaDataIO package loads and exports 'SomaScan' data via the 'SomaLogic Operating Co., Inc.' proprietary data file, called an ADAT ('*.adat'). The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory.
https://somalogic.github.io/SomaDataIO/
Other
25 stars 17 forks source link

No detailed information about SomaLogic proteomics data #12

Closed peng-gang closed 2 years ago

peng-gang commented 2 years ago

I am working on a SomaLogic proteomics dataset from human. I find that many proteins are not from human. There are proteins from mouse and jellyfish. Are these proteins used for quality control?

Another question is there are multiple entries for one protein. For example, there are 4 entries (SeqID: 15343-337, 19631-13, 4918-21, and 7784-1) in my data for protein "P01042". Two of them (4918-21, and 7784-1) have the same target: "Kininogen, HMWt. However, the correlation fo the two is not very high (correlation coefficient = 0.75). Shall I use the mean of the proteins with same target for the following analysis?

In the example codes of the package, all these proteins are treated in the same way.

There are many columns of annotation data from function "getAnalyteInfo". Some columns are straightforward, but some are not. For example, "ColCheck", shall I only use the protein with "PASS" in this column?

There should be a more detailed document about the data.

wschwarzmann commented 2 years ago

Hello! I'm from SomaLogic's Global Scientific Engagement team. I've discussed your questions with some members of our Bioinformatics team, and here is our response:

If you have any further questions or need more explanation, feel free to reach out to your sales rep and we can set up a tech support call with you. We can also provide you with a file that has a breakdown of the adat format and a description of each column. We appreciate your feedback, and are working on adding column descriptions in the documentation here.

peng-gang commented 2 years ago

Thank you so much for your reply. I just got data from my collaborator. I will ask him for the sales rep's information later.

One more question, if two SOMAmer reagents have same target, why the correlation coefficient is relatively low. Both "4918-21" and "7784-1" target "Kininogen, HMW". However, the correlation coefficient is only 0.75 as shown below. If we remove outliers around 0, the correlation coefficient would be only 0.2.

Kininogen

wschwarzmann commented 2 years ago

It's possible that these particular SOMAmer Reagents are targeting different epitopes with different availability, or different constructs of the protein (fragments, isoforms, heterodimers, etc). Availability of an epitope can be affected by several things: modifications to a protein, SNPs, or interference from a competitor. We recommending running your analysis and seeing which SOMAmer Reagent measurement is significant downstream. We can explore your data and examine this specific instance if we can create a case through your collaborator's sales rep.

peng-gang commented 2 years ago

I see. Thanks.

stufield commented 2 years ago

In addition to the answers above, in the next release of SomaDataIO the following table will be available in the documentation under the Col.Meta/Annotations help:

?colmeta
?annotations

Col.Meta

Field Description Example
SeqId SomaLogic sequence identifier 2182-54_1
SeqidVersion Version of SOMAmer sequence 2
SomaId Target identifier, of the form SLnnnnnn (8 characters in length) SL000318
TargetFullName Target name curated for consistency with UniProt name Complement C4b
Target SomaLogic Target Name C4b
UniProt UniProt identifier(s) P0C0L4 P0C0L5
EntrezGeneID Entrez Gene Identifier(s) 720 721
EntrezGeneSymbol Entrez Gene Symbol names C4A C4B
Organism Protein Source Organism Human
Units Relative Fluorescence Units RFU
Type SOMAmer target type Protein
Dilution Dilution mix assignment 0.01%
PlateScale_Reference PlateScale reference value 1378.85
CalReference Calibration sample reference value 1378.85
medNormRef_ReferenceRFU Median normalization reference value 490.342
CalV4\<YY>\<SSS>\<PPP> Calibration scale factor (for given year-study-plate) 0.64
ColCheck QC acceptance criteria across all plates/sets PASS
QcReference_\<LLLLL> QC sample reference value (for given QC lot) PASS
CalQcRatioV4\\<SSS>\<PPP> Post calibration median QC ratio to reference (for given year-study-plate) 1.04