A list of minor/low-priority issues labeled as "enhancement". We need to re-visit this list.
Documentation
[X] We need to brush up the documentation. Maybe it would be a good idea in the long term to switch to RST.
I'm working on this now.
Installation
[X] Create a separate folder from the models from different versions of python: there seems to be some issues unpickling Stan models using different versions of python. So it would make sense to make the model path something like:
$BASE/rpbp_models/python-<version>/...
Moved to CmdStanPy, no pickling anymore, models are installed/compiled under the conda environment by default.
[X] Add setup option to force recompilation of Stan models: by default, if the stan pickle models already exist, they are not recompiled. This can sometimes cause a problem due to changing versions of pystan and backwards compatibility issues.
This is not entirely resolved, listed in #133
Visualisation
Reporting/downstream analyses done via Dash.
[X] ORF visualization: add additional genome browser tracks such as:
Adding the bam files to IGV is not so helpful because they include the entire reads and are not shifted to account for P-site offsets. Brief online searching suggests the best approach is probably to first convert the P-site bed object to wiggle, then the wiggle to bigWig.
[X] Replicate correlation plots: add correlation plots of RPMs (or some other normalised value) between replicates after corrected assignment on codon and maybe on nucleotide level (see replicate ORF profiles).
[X] Handle all levels of "sample" specification in get-all-orf-peptide-matches:
The script is hard-coded to work with "cell-types" from the config file. It would be nice if it also handled samples (riboseq_samples" key) and conditions ("riboseq_biological_replicates" key).
[ ] Add a command line option to the script to specify the level
[ ] Add a function to ribo_utils.py which returns a list of the appropriate names
[ ] Use this function rather than the call to ribo_utils.get_riboseq_cell_type_samples
[ ] Add a function to riboutils.py which returns the appropriate "peptide_analysis" dictionary
[ ] Use that in the loop
This will also entail finding the correct filename based on the level (e.g., "sample" filenames include lengths and offsets, while the others do not; the locations are different).
[X] Create proteomics results plots: add notebooks and plots to the peptide report which show the proteomics results.
[x] Venn diagram of detected peptide sequences with given PEP threshold
[ ] Add detected peptides overlap to proteomics-report
A list of minor/low-priority issues labeled as "enhancement". We need to re-visit this list.
Documentation
I'm working on this now.
Installation
$BASE/rpbp_models/python-<version>/...
Moved to CmdStanPy, no pickling anymore, models are installed/compiled under the conda environment by default.
This is not entirely resolved, listed in #133
Visualisation
Reporting/downstream analyses done via Dash.
[X] ORF visualization: add additional genome browser tracks such as:
Adding the bam files to IGV is not so helpful because they include the entire reads and are not shifted to account for P-site offsets. Brief online searching suggests the best approach is probably to first convert the P-site bed object to wiggle, then the wiggle to bigWig.
[X] Replicate correlation plots: add correlation plots of RPMs (or some other normalised value) between replicates after corrected assignment on codon and maybe on nucleotide level (see replicate ORF profiles).
[X] Handle all levels of "sample" specification in get-all-orf-peptide-matches:
The script is hard-coded to work with "cell-types" from the config file. It would be nice if it also handled samples (riboseq_samples" key) and conditions ("riboseq_biological_replicates" key).
This will also entail finding the correct filename based on the level (e.g., "sample" filenames include lengths and offsets, while the others do not; the locations are different).
[X] Create proteomics results plots: add notebooks and plots to the peptide report which show the proteomics results.