SlavovLab / DART-ID

DART-ID: retention time alignment and peptide identification confidence updates
https://dart-id.slavovlab.net
MIT License
14 stars 4 forks source link

Planned Features #1

Open atc3 opened 6 years ago

atc3 commented 6 years ago

Will update this as things are changed

To accept the future behavior, pass 'sort=True'.

To retain the current behavior and silence the warning, pass sort=False

sort=sort)


- [ ] fix RI columns being wiped when concatenating
- [ ] quick and dirty pairwise correlation b/n experiments - to see outliers and warn the user that they should be filtered out
- [x] check the max(PEP) of each raw file, and warn the user if they input a raw file with PEPs that are too low (nothing to boost)
- [x] retention length filtering - raw file specific
- [ ] rename output columns
- [x] migrate to config file instead of command-line options
- [x] improve input file-type converting
  - [x] file-type determines column names
  - [x] move filtering blocks into separate functions. file-type determines which functions are run
- [x] pip installable
- [x] violin plot of residual density by RT (RT on x-axis)
- [ ] pairwise correlation of RTs - heatmap
- [x] diagnostic figures for the update portion
    - [x] PEP vs PEP.new scatterplot
    - [x] fold change increase in IDs as function of PEP threshold
- [x] validation figures
    - [x] multiple peptides of the same protein - should have the same intensity (measure the CV)
- [x] generate HTML file to view figures
- [ ] add and start throwing exceptions
- [x] create entire output directory including all subfolders
- [x] parameter for defining column headers - additional option instead of specifying the file type
- [x] fix experiment exclusion
- [x] optional save alignment parameters
- [x] split up outputs in same way the inputs are split up
    - [x] then remove input_id column
- [x] remove id column as well?
- [x] verbose levels and actually enforce them in code
- [x] additional parameters to select which columns to have
    - [x] default should just be pep_new. maybe have a "diagnostic" flag that includes the other columns?
- [x] logging -> logger
- [x] default retention length filter - (max_retention_time) / 60
- [x] optimize experiment updating
- [x] filter_decoys/contaminants -> include_decoys/contaminants
- [x] add PEP_updated column

## FUTURE VERSION
- [ ] move off of STAN
- [ ] optimize data selection by RT bin, experiment, and peptide. remove as much as possible but retain the same amount of coverage.