cpanse / protViz

Visualizing and Analyzing Mass Spectrometry Related Data in Proteomics
https://CRAN.R-project.org/package=protViz
GNU General Public License v3.0
11 stars 3 forks source link

discrepancy between R and awk fdr code snippets #14

Closed cpanse closed 3 years ago

cpanse commented 3 years ago

https://github.com/cpanse/protViz/blob/66e5281b771fbcabd42f84d8aadbc27c275c3672/R/comet.R#L9

contradiction to the following bash code;

#!/bin/bash

# scp r28:/scratch/cpanse/downsizingDBs/comet/all2/20201022_002_S223359_474-B4.comet.txt .
COMET="20201022_002_S223359_474-B4.comet.txt"

cat ${COMET} \
  | awk 'NF>16{print}' \
  | awk '$2==1{print}' \
  | sort -n -r -k10 \
  | awk 'BEGIN{dc=0}$16~/^REV_/{dc++}{fdr=dc/(NR-dc);print dc"\t"NR"\t"fdr}'  | less
cpanse commented 3 years ago

@ mariaderrico

cat ${COMET} \
  | awk -F"\t" 'NF==19{print $9"\t"$16}' \
  | cut -d"," -f1 \
  | sort -gr -k1 -S256M \
  | awk 'BEGIN{dc=0}$2~/^REV_/{dc++}{fdr=dc/(NR-dc);print dc"\t"NR"\t"fdr}'  \
  | awk '$NF<0.05{print}' \
  | tail

produces

251     5316    0.0495558
252     5317    0.0497532
252     5318    0.0497434
252     5319    0.0497336
253     5320    0.0499309
253     5321    0.0499211
253     5322    0.0499112
253     5323    0.0499014
253     5324    0.0498915
253     5325    0.0498817

and

library(protViz)
S <- lapply(F <- list.files(pattern="*.comet.txt"),
  function(f) read.table(f, header = TRUE, fill = TRUE, skip=1))  
rv <- do.call('rbind', lapply(S, protViz::summary.cometdecoy))

rv$mgf <- F  
knitr::kable(rv)
results in nPSM psmFdrCutoff nDecoyPSM nConfidentPSM nDecoyPeptide nConfidentPeptide nDecoyProteins nConfidentProteins fdrPSM fdrPeptide fdrProtein mgf
52158 0.05 254 5327 199 3352 199 1986 0.0476816 0.0593675 0.1002014 20201022_002_S223359_474-B4.comet.txt

5327 = 5325 + 2 Q.E.D.