PyProphet / pyprophet

PyProphet: Semi-supervised learning and scoring of OpenSWATH results.
http://www.openswath.org
BSD 3-Clause "New" or "Revised" License
29 stars 21 forks source link

Feature/merged subsampling #82

Closed singjc closed 4 years ago

singjc commented 4 years ago

[FEATURE] Merging individual scored files and sub-sampling a merged file

I have added some features for merging and sub-sampling.

This can be split into two aspects:

  1. Merging individual scored osw files
  2. Sub-sampling a merged.osw file

1. Merging individual scored osw files

I have added to the merge function to allow for the merging of individual post scored files. \ I had several osw files from the same experiment that had scoring applied to them each, but I needed a single merged osw file to run another external analysis script. Currently merge would remove the extra score tables or call the oswr reduced merge function.

I have added a predicate to the merge function to specify if merging post scored runs. (--merge_post_scored_runs). If merging post scored runs, merge will call def merge_oswps to merge all runs, and retain all tables. (See: levels_contexts.py#L824-L1002)

2. Sub-sampling a merged.osw file

This was requested by @Matthias313. He wanted to create a sub-sampled osw from a merged.osw file, instead of sub-sampling from individual runs. I think he was worried about the comparability of sub-sampling the individual runs alone, but maybe he can comment further on his concerns.

I have added a check, to see if the input is a file containing more than one run in the RUN table. (See: levels_contexts.py#L290-L294) \ if you sub-sample a merged.osw file, the PRECURSOR table, TRANSITION table and the TRANSITION_PRECURSOR_MAPPING table are no longer present, meaning you would have to call the merge function again with a template to append those tables. \ To avoid having to perform this unnecessary step, if there are multiple runs in the supplied file to subsample, then I append these tables needed for scoring. (See: levels_contexts.py#L433-L481)

I have performed tests on individual sub-sampling and merged sub-sampling, and based on my results they seem comparable.

Pyprophet report of the merged individually sub-sampled runs (model.osw)

merged_individual_subsampled_runs_model.pdf

Pyprophet report of applying weights to run 1

run_1_applied_weights.pdf

Pyprophet report of applying weights to run 2

run_2_applied_weights.pdf

Pyprophet report of sub-sampling a merged file (model.osw)

merged_subsampled_model.pdf

Pyprophet report of applying weights to merged file

merged_subsampled_applied_weights.pdf

If there is anything else I need to add or do, or if you have any comments, please let me know.

Warm Regards,

Justin

grosenberger commented 4 years ago

Excellent, thanks for the PR!