DIA-NN input - Githubissues

m-pauper commented 4 months ago

Hello, nice tool, thank you for publishing it.

It would be nice to be able to input DIA-NN's report.tsv, as I am not 100% sure on how to convert that to the generic csv.

m-pauper commented 4 months ago

I have been looking at this and have a few more questions:

I actually noticed that in the Shiny app that you provide, DIA-NN is actually one of the possible file types. However, if I download the copy of the code from Zenodo and run the Shiny app localy, then DIA-NN is not an option. I imagine the option was added later on, and the Zenodo version does provide the latest version. Could you please confirm?

~2. While reading the code, I got confused in the read_peptide_tsv_DIANN_comb function. It seems that at the beginning you expect the general DIA-NN report.tsv, which contains the "File.Name" column. However, further down, after only keeping the numeric columns, rowSums is used, which would only make sense if the input was the report.pr_matrix.tsv from DIA-NN, not the general report.tsv. Could you please confirm that is the case, or am I missing something?~ Nevermind, I guess it is written so that either type of file can be used.

I am struggling to understand the logic when comb_method is "Average":
```
sample_count <- length(unique(peptide_import$File.Name))
if (comb_method=="Average"){
    peptides$Intensity <- peptides$Intensity / sample_count
  }
```
a) sample_count does not take into account that sample_pattern might have excluded some samples. b) this simply divides each peptides intensity, not the their sum to truly get the average for each peptide across samples.

Sorry for flooding you with questions, the package's functionality is awesome, I really want to explore my data with it.

Kind regards!

weaversd commented 4 months ago

Thanks for asking about these! I will take a look at this in more detail tomorrow. Can you attach your DIANN output file that you want to use (either the whole thing or a truncated version) so I can make sure I'm looking at the same thing as you?

You are correct, we added DIANN format later and we didn't do as much testing on that format. I will update the Zenodo once I get a chance to look at this in detail this week.

I think you're right about the bug with sample_count, it looks like that is being counted based on the imported file rather than the filtered file for the DIANN import, I will fix that. It should just mean counting the unique File.names in the peptide dataframe which is filtered, rather than the peptide_import dataframe.

The averaging is not very obvious, but I do believe that this is being done correctly (once sample count is fixed). In the report.tsv format, the rows with the same peptide are summed within the plotting machinery before they are shown and the final csvs are exported. In the pr_matrix format where the individual injections are in in the other dimension (columns), the intensities are summed into one vector before they are divided by the sample count.

But again, I will take a closer look into the details of both sample count and the averaging of those calculations tomorrow and make sure. If I can make it more obvious I will, and I will add some comments to clarify.

weaversd commented 3 months ago

Updated Zenodo

Champion-Lab / PrIntMap-R

DIA-NN input #22