Closed peng-gang closed 2 years ago
Hello! I'm from SomaLogic's Global Scientific Engagement team. I've discussed your questions with some members of our Bioinformatics team, and here is our response:
There are some SOMAmer Reagents that are our controls and some that target non-human proteins. To just analyze the 7288 analytes that target human proteins, you'll want to select Type="Protein"
and Organism = "Human"
.
Some SOMAmer reagents bind to a different epitope on the same target protein as another. SOMAmer reagents with the same target may have a different pattern if one of those epitopes is less available.
ColCheck refers to if the SOMAmer reagent measurements in our QC replicates pass our metrics. In your SomaScan Quality Statement file, you can find a breakdown of these under the SOMAmers in Tails section. SOMAmers In Tails refers to the cumulative number of SOMAmer reagents in the QC control with a ratio on any plate outside the accepted accuracy range, 0.8 - 1.2, when compared to the reference. Flagged SOMAmer reagents are typically retained for analyses since accuracy across all assay runs is a robust quality metric but is not a requirement for identification of meaningful biological signal.
If you have any further questions or need more explanation, feel free to reach out to your sales rep and we can set up a tech support call with you. We can also provide you with a file that has a breakdown of the adat format and a description of each column. We appreciate your feedback, and are working on adding column descriptions in the documentation here.
Thank you so much for your reply. I just got data from my collaborator. I will ask him for the sales rep's information later.
One more question, if two SOMAmer reagents have same target, why the correlation coefficient is relatively low. Both "4918-21" and "7784-1" target "Kininogen, HMW". However, the correlation coefficient is only 0.75 as shown below. If we remove outliers around 0, the correlation coefficient would be only 0.2.
It's possible that these particular SOMAmer Reagents are targeting different epitopes with different availability, or different constructs of the protein (fragments, isoforms, heterodimers, etc). Availability of an epitope can be affected by several things: modifications to a protein, SNPs, or interference from a competitor. We recommending running your analysis and seeing which SOMAmer Reagent measurement is significant downstream. We can explore your data and examine this specific instance if we can create a case through your collaborator's sales rep.
I see. Thanks.
In addition to the answers above, in the next release of SomaDataIO the following table will be available in the documentation under the Col.Meta/Annotations help:
?colmeta
?annotations
Field | Description | Example |
---|---|---|
SeqId | SomaLogic sequence identifier | 2182-54_1 |
SeqidVersion | Version of SOMAmer sequence | 2 |
SomaId | Target identifier, of the form SLnnnnnn (8 characters in length) | SL000318 |
TargetFullName | Target name curated for consistency with UniProt name | Complement C4b |
Target | SomaLogic Target Name | C4b |
UniProt | UniProt identifier(s) | P0C0L4 P0C0L5 |
EntrezGeneID | Entrez Gene Identifier(s) | 720 721 |
EntrezGeneSymbol | Entrez Gene Symbol names | C4A C4B |
Organism | Protein Source Organism | Human |
Units | Relative Fluorescence Units | RFU |
Type | SOMAmer target type | Protein |
Dilution | Dilution mix assignment | 0.01% |
PlateScale_Reference | PlateScale reference value | 1378.85 |
CalReference | Calibration sample reference value | 1378.85 |
medNormRef_ReferenceRFU | Median normalization reference value | 490.342 |
CalV4\<YY>\<SSS>\<PPP> | Calibration scale factor (for given year-study-plate) | 0.64 |
ColCheck | QC acceptance criteria across all plates/sets | PASS |
QcReference_\<LLLLL> | QC sample reference value (for given QC lot) | PASS |
CalQcRatioV4\ |
Post calibration median QC ratio to reference (for given year-study-plate) | 1.04 |
I am working on a SomaLogic proteomics dataset from human. I find that many proteins are not from human. There are proteins from mouse and jellyfish. Are these proteins used for quality control?
Another question is there are multiple entries for one protein. For example, there are 4 entries (SeqID: 15343-337, 19631-13, 4918-21, and 7784-1) in my data for protein "P01042". Two of them (4918-21, and 7784-1) have the same target: "Kininogen, HMWt. However, the correlation fo the two is not very high (correlation coefficient = 0.75). Shall I use the mean of the proteins with same target for the following analysis?
In the example codes of the package, all these proteins are treated in the same way.
There are many columns of annotation data from function "getAnalyteInfo". Some columns are straightforward, but some are not. For example, "ColCheck", shall I only use the protein with "PASS" in this column?
There should be a more detailed document about the data.