fgcz / prolfqua

Differential Expression Analysis tool box R lang package for omics data
https://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.2c00441
MIT License
37 stars 7 forks source link

DiaNN read report function #53

Closed clawless-inoviv closed 1 year ago

clawless-inoviv commented 1 year ago

Hi Thanks for developing an excellent framework for LFQ. I just wanted to highlight something that I've found.

The function "diann_read_output" does not result in the required data frame per the parameters supplied.

There are two separate issues:

  1. Supplying the nrpeptides parameter does not filter the data frame the during import function, i.e. setting nrPeptide=2 has no effect.

In the section of the R code:

filter_PG <- function(PG, nrPeptides = 2, Q.Value = 0.01){
    PG <- PG |> dplyr::filter(nrPeptides >= nrPeptides)
    PG <- PG |> dplyr::filter(.data$Lib.PG.Q.Value < Q.Value)
    PG <- PG |> dplyr::filter(.data$PG.Q.Value < Q.Value)
  }

The variable nrPeptides set in the function call is being overridden by the local nrPeptide from the data frame and appears to be checking against itself. Would it be possible to change the parameter name to nrPeptides_min or something other than nrPeptides?

  1. The filtering by Q.value is performed after the nrPeptides column is calculated, so the resulting data frame will contain proteins quantified by a single peptide.

The above section of code ideally needs to be separated, whereby the peptides are filtered by Q.value before the nrPeptides column is calculated. The data frame can be filtered by the minimum peptide threshold afterwards.

I hope this makes sense.

Best,

Craig

wolski commented 1 year ago

Dear Craig,

Thank you for using prolfqua and reporting the issue with the filtering. The commit should fix it: https://github.com/fgcz/prolfqua/commit/926e0412e274869187bddc48ec4545a2870b2678

Regarding your point 2: See also this issue on the DiaNN GitHub repository: https://github.com/vdemichev/DiaNN/issues/398

I agreed with Vadim's opinion that: "But I don't think this journal request makes sense for most analyses, so to formally comply with it can also calculate the number of stripped sequences matching the protein before the filtering."

And therefore I filter first for number of peptides and then filter for Q-values.

regards W

wolski commented 1 year ago

I am closing the issue. If you have any more questions, please add them here, and we can reopen the issue.

clawless-inoviv commented 1 year ago

Thank you for this and the link to the explanation.