Open brvpuyve opened 3 years ago
Hi Bart, did you get any help with this?
I suspect you could use the field characteristics[pooled sample]
and list in it all the samples that are pooled (SN=sample 1,sample 2, … sample 9
were "sample n" is the value of the corresponding sample in source name
).
For the relative quantities I am not sure. Others may have better ideas. Maybe you could use the key QY=
to indicate relative quantity (like in characteristics[spiked compound]
), but I am not sure how to make the sample names correspond to the quantities.
Also, I don't know how to do if one of the pooled samples is not analysed alone (so there is no .raw file associated to one of the sample names).
Hi @mlocardpaulet @brvpuyve :
First, my apologies for the late reply, I was OFF for a couple of weeks. I was discussing a some weeks ago about with @anjaf about how to represent multiplexed samples in an experiment.
We have two options here:
1- Represent each sample as an independent sample, adding a characteristics to the sample called characteristics[concentration of]
and link each sample to the same data file. The characteristics[organism]
will be different for each sample. This is actually a clean representation because each sample has its own row and can be represented with more characteristics. It has differences with the current pooled
approach mentioned by @mlocardpaulet because in the pooled approach samples are used multiple times in their corresponding channel + in the pooled.
It will be something like:
source name | characteristics[organism] | characteristics[organism part] | characteristics[biological replicate] | characteristics[concentration of] | assay name | comment[technical replicate] | comment[fraction identifier] | comment[label] | comment[data file] | characteristics[concentration of] |
---|---|---|---|---|---|---|---|---|---|---|
Sample-1 | homo sapiens | heart | 1 | 70% | ms_run 1 | 1 | 1 | label free sample | file1.raw | 70% |
Sample-2 | e coli | liver | 1 | 60% | ms_run 1 | 1 | 1 | label free sample | file1.raw | 60% |
As you can see the assay name
is the same meaning that the file and the label conditions are the same.
2- @anjaf mentioned before the idea of having an characteristics[organism]
called mixed, then we can represent all the species in the sample in the characteristics[pooled sample]
as key values pairs with concentrations.
Would be great to have your opinion @anjaf @jgriss @mvaudel @mlocardpaulet @all @bigbio/collaborators
Hi @ypriverol thanks a lot. I like option 1- very much. So to be clear: there will be duplicated file names?
Option 1 is maybe the best approach although it will be some work for me to add the extra lines :-) Let me know what is decided and I will create the SDRF's.
Thanks for the comments!
Hi @ypriverol thanks a lot. I like option 1- very much. So to be clear: there will be duplicated file names?
Yes. We have the same case when multiple samples are multiplexed in the same RAW file.
I guess option one is fine if the python client can identify such a case?
raw
file, if not unique is a mixture?Hi all,
We already have this case covered in some sorts for isobarically labelled experiments (see PXD017799 as an example). Here, we also have mixtures of multiple, independent samples in one raw
file.
I therefore strongly suggest to stay consistent with the design approach that was chosen there, which essentially is what @ypriverol mentioned as option 1.
In case of isobarically labelled experiment, this could even be extended to have multiple rows referencing the same channel in the raw
file.
@enryH
characteristics[concentration of]
should be optional, but if provided must add up to 100% to be validraw
file multiple times indicating that it's a mixture. But we might not always have / need f.e. the individual sample concentrations - just to keep this case in mind as wellHello again, sorry it took me so long to come back to this.
I am looking at the headers that have been utilised in the SDRF generated to date and I see that characteristics[concentration of]
is used to define the concentration of compounds defined in characteristics[compound]
. So if we go with the option 1 (if I understood well: one row per sample in the pool, with the respective quantities annotated in characteristics[concentration of]
), can you distinguish the 2 usages of characteristics[concentration of]
?
Could this be an issue?
Hmm. If there is characteristics[organism]
and characteristics[compound]
then I guess it has to be ordered, but I am not 100% sure about this:
characteristics[organism] | characteristics[concentration of] | characteristics[compound] | characteristics[concentration of] |
---|
Could you explain the type of experiment where this is an issue?
But I agree that this could be an issue if it leads to ambiguous interpretations.
Hello,
I guess you are right, I cannot see an example where it would be used.
Hi everyone,
I generated an updated LFQbenchmark dataset, similar to the one from Navarro et al. (https://pubmed.ncbi.nlm.nih.gov/27701404/). I was wondering how I could best annotate the mixtures (as pooled samples)? Can I mention more than one organism in the characteristics[organism] column? Additionally, would it be beneficial to add an additional comment section to define the ratio's of the three proteomes?
Looking forward to your suggestions!
Best,
Bart Van Puyvelde