dieterich-lab / rp-bp

Rp-Bp is a Bayesian approach to predict, at base-pair resolution, ribosome occupancy and translation.
MIT License
7 stars 5 forks source link

QC and downstream analysis #87

Closed eboileau closed 6 years ago

eboileau commented 6 years ago

I open this thread to follow-up on minor issues related to the analysis scripts, when running the example dataset (see also #76 ).

1. When running create-rpbp-preprocessing-report, call to pdflatex initially fails. I suggest to include graphics path files without extension, in this way the graphics package looks for a supported graphics format automatically. 2. Bars are missing in the read length distributions bar plots. 3. With show-read-length-bfs, compilation fails.

  1. For rep 1, we don't have enough reads to visualize (read length periodicity), unless we set --min-visualization-count very low, say 10. Next release should probably include the full working example with suggested values for parameters/options.

Not tested yet with option --create-fastqc-reports.

FIX bio_utils.plotting.plot_read_length_distribution, create_rpbp_preprocessing_report and visualize_metagene_profile_bayes_factor (minor changes to plotting options, typos, latex commands modified, redundant lines removed, etc.) The report should now be created no matter which option is selected. This will be added to the next release.

eboileau commented 6 years ago

When running create-rpbp-predictions-report:

  1. The report contains misformatted figures (axes, etc.), in particular the ORF bar plots and length distributions.
  2. With the --show-chisq option, it seems that fraction and reweighting_iterations are missing when calling get_riboseq_predicted_orfs, as a result files (names without the frac-smoothing_fraction.rw-smoothing_reweighting_iterations) are reported as missing.
eboileau commented 6 years ago

Re point 2 above: estimate-orf-bayes-factors returns everything as a BED12+ file with frac-smoothing_fraction.rw-smoothing_reweighting_iterations in the file name, whether we only want the chi square value or not (it will be included by default). However, none of the file names account for the differences between is_chisq_values = [True, False] when selecting the final prediction sets. As a results, all fine names contain the string frac-smoothing_fraction.rw-smoothing_reweighting_iterations. For the QC/analysis, and in particular in create_rpbp_predictions_report, this is problematic with --show-chisq, since this results in a mismatch in the file names. This is only a matter of naming convention, but for consistency should be changed throughout the code.

bmmalone commented 6 years ago

Hi Etienne,

Just to clarify a bit here...

eboileau commented 6 years ago

Hi Brandon,

In the same order:

As for the test example, indeed something should be done in the lines of #76 . I could probably use TestRpBp.py as a starting point, though I remember you mentioned something (not sure if this is what you were referring to?). I could then include the example-specific parameters in there, or else as you mention using the config file (and update the relevant scripts).

eboileau commented 6 years ago

Ok, first the chi stuff has been relegated to cases where the option chi_square_only is given, so we don't have all these files generated by default. As for the post-proc analysis (reports and plots), the reports now generate without any issues, but further testing/fine-tuning will be necessary. On this matter, I am also updating the docs for the QC/analysis, so will close this issue for now.