Open m-pauper opened 4 months ago
I have been looking at this and have a few more questions:
~2. While reading the code, I got confused in the read_peptide_tsv_DIANN_comb
function. It seems that at the beginning you expect the general DIA-NN report.tsv, which contains the "File.Name" column. However, further down, after only keeping the numeric columns, rowSums
is used, which would only make sense if the input was the report.pr_matrix.tsv
from DIA-NN, not the general report.tsv
. Could you please confirm that is the case, or am I missing something?~ Nevermind, I guess it is written so that either type of file can be used.
comb_method
is "Average":
sample_count <- length(unique(peptide_import$File.Name))
if (comb_method=="Average"){
peptides$Intensity <- peptides$Intensity / sample_count
}
a) sample_count
does not take into account that sample_pattern
might have excluded some samples. b) this simply divides each peptides intensity, not the their sum to truly get the average for each peptide across samples.
Sorry for flooding you with questions, the package's functionality is awesome, I really want to explore my data with it.
Kind regards!
Thanks for asking about these! I will take a look at this in more detail tomorrow. Can you attach your DIANN output file that you want to use (either the whole thing or a truncated version) so I can make sure I'm looking at the same thing as you?
You are correct, we added DIANN format later and we didn't do as much testing on that format. I will update the Zenodo once I get a chance to look at this in detail this week.
I think you're right about the bug with sample_count, it looks like that is being counted based on the imported file rather than the filtered file for the DIANN import, I will fix that. It should just mean counting the unique File.names in the peptide
dataframe which is filtered, rather than the peptide_import
dataframe.
The averaging is not very obvious, but I do believe that this is being done correctly (once sample count is fixed). In the report.tsv format, the rows with the same peptide are summed within the plotting machinery before they are shown and the final csvs are exported. In the pr_matrix format where the individual injections are in in the other dimension (columns), the intensities are summed into one vector before they are divided by the sample count.
But again, I will take a closer look into the details of both sample count and the averaging of those calculations tomorrow and make sure. If I can make it more obvious I will, and I will add some comments to clarify.
Updated Zenodo
Hello, nice tool, thank you for publishing it.
It would be nice to be able to input DIA-NN's report.tsv, as I am not 100% sure on how to convert that to the generic csv.