Biocomputing-Research-Group / MetaLP

2 stars 0 forks source link

Guidance on usage with Sipros Ensemble #1

Open ohickl opened 1 year ago

ohickl commented 1 year ago

Hi,

I am attempting to give MetaLP a try but am unsure how to create the input files in the correct way. I built a protein to taxonomy mapping file that looks like this:

CFCPBJGN_213170 Lachnospiraceae
CFCPBJGN_213169 Lachnospiraceae
CFCPBJGN_375941 Oscillospirales
CFCPBJGN_538975 Sutterella
CFCPBJGN_337440 Unknown
CFCPBJGN_254306 bin.9_Bariatricus_comes
CFCPBJGN_254314 bin.9_Bariatricus_comes
CFCPBJGN_254305 bin.9_Bariatricus_comes
CFCPBJGN_19817  Unknown
CFCPBJGN_652540 Unknown
CFCPBJGN_145008 Bacteroides
...

I used the GTDB-tk taxonomy of bins of sufficient quality as well as genus level taxonomy of unbinned, annotated contigs. The genus level was to not have to many taxa, not sure if species level taxonomy makes more sense.

The probabilities output from MRC.out is called null and starts like this:

.       0
..      0
14-2.fasta      8.48019e-06
1XD42-69.fasta  2.73021e-05
1XD8-76.fasta   1.1261e-06
28-YEA-48.fasta 3.44723e-07
51-20.fasta     1.44784e-06
AC2028.fasta    1.37889e-07
AF33-28.fasta   3.81494e-06
AM51-8.fasta    2.59232e-05
Absicoccus.fasta        1.33293e-06
Acerihabitans.fasta     2.29815e-07
Acetanaerobacterium.fasta       8.9628e-07
...

I generated it using MRC.out -i mg.reads.sorted.sam -c taxa_contigs/ -t 30

How do i generate the PeptideProphet input from SiprosEnsemble output? I ran sipros_psm_tabulating.py with -x, getting the mvh, wdp and xcorr pep.xml files. I currently have PeptideProphetParser from TPP 5.0, but it will not run, e.g.:

 (Sipros Ensemble mvh)
error: engine Sipros Ensemble mvh not recognized

Is the version just too old and I have to compile a more recent version from TPP 6.*? What is the proper way of running PeptideProphet on the SiprosEnsemble output to get the correct MetaLP input?

Best

Oskar

ohickl commented 1 year ago

I used the PeptideProphet in the latest Philosopher version, which worked but warned that the sipros models were experimental. I ran it on the three pep.xml files and combined the results using iProphet. I did manage to run MetaLP, getting substantially larger numbers of protein groups than with SiprosEnsemble or e.g. FragPipe:

...
number of target protein groups within FDR 0.01: 17958
number of target proteins within FDR 0.01: 37260
...

How do I properly interpret the MetaLP output. E.g. which score threshold should I apply, if any. Do you advise any other post-processing?