dieterich-lab / rp-bp

Rp-Bp is a Bayesian approach to predict, at base-pair resolution, ribosome occupancy and translation.
MIT License
7 stars 5 forks source link

Predictions dashboard with test data not working #139

Closed dokempf closed 1 year ago

dokempf commented 1 year ago

With the c-elegans test data, the rpbp_predictions_dashboard app does not execute. The reason is that it looks for an output file WBcel235.chrI.fa.fai.txt, where only WBcel235.chrI.fa.txt exists. The triggering line of code is https://github.com/dieterich-lab/rp-bp/blob/dev-ssciwr/src/rpbp/analysis/rpbp_predictions/dashboard/rpbp_predictions_dashboard.py#L323

dokempf commented 1 year ago

There are follow-up errors to this one - so the question might as well be: What do I need to do to run the predictions app on the c-elegans test data?

eboileau commented 1 year ago

Indeed... the index does not exist for the test data... but in general it will or should... so maybe I can actually add the index to the test data, but meanwhile, under the conda environment, do

samtools faidx /scratch/global_tmp/pytest-of-eboileau/pytest-82/data0/c-elegans-chrI-example/input/WBcel235.chrI.fa

or whatever the path is to your data. You should then have at that location a file named WBcel235.chrI.fa.fai. Then you need to re-run the prep script (summarize-rpbp-predictions with --overwrite), before launching the app.

eboileau commented 1 year ago

I must improve error/missing data handling in the app, as this particular case results, as far as I can see, from summarize-rpbp-predictions completing without being able to generate the complete data (actually, if you check the logs, there should be a statement such as "Continuing, but file is missing!").

dokempf commented 1 year ago

There still seems to be some step missing or some hardcoded behaviour is not suited for the test dataset:

Traceback (most recent call last):
  File "/home/dkempf/miniconda3/envs/rpbp-dev/bin/rpbp_predictions_dashboard", line 33, in <module>
    sys.exit(load_entry_point('rpbp', 'console_scripts', 'rpbp_predictions_dashboard')())
  File "/home/dkempf/miniconda3/envs/rpbp-dev/bin/rpbp_predictions_dashboard", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/dkempf/miniconda3/envs/rpbp-dev/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/home/dkempf/miniconda3/envs/rpbp-dev/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/dkempf/rp-bp/src/rpbp/analysis/rpbp_predictions/dashboard/rpbp_predictions_dashboard.py", line 590, in <module>
    "data": circos_graph_data[f"histogram_{orf_type_default}"],
KeyError: 'histogram_CDS'
eboileau commented 1 year ago

Did you follow https://github.com/dieterich-lab/rp-bp/issues/136#issuecomment-1378426789 ?

For the test data, the prep script must be called with specific arguments to work

summarize-rpbp-predictions /scratch/global_tmp/pytest-of-eboileau/pytest-82/data0/c-elegans-chrI-example/c-elegans-test.yaml --circos-bin-width 100000 --circos-show-chroms I

If called more than once w/o --overwrite, new files will NOT be generated. Logging should report this. If data is not properly generated, this will not work in the app.

I agree, as I said above, this should be integrated with the test data or at least documented (WIP)... and error/missing data handling must be improved.

Please let me know if this solves the problem...

dokempf commented 1 year ago

There seem to be some missing parameters for the manual samtools invocation you mentioned above. The generated index file is only 20 Bytes with above invocation. I have a potential solution to #137, but cannot properly test it currently.

eboileau commented 1 year ago

Yes, the index file for the c-elegans will be very small (20B is what I have), as the test data has only one chromosome... so this is fine. But I will add it to the test data anyway.

We need a larger data for #137. I will send you a link.

eboileau commented 1 year ago

I have prepared a larger test dataset for #137

Activate your environment.

wget https://data.dieterichlab.org/s/d9XNab4i4MFbw3S/download -O bigger-test-data.zip && unzip bigger-test-data.zip && cd bigger-test-data && chmod +x setup; ./setup

Pre-computed analyses are under riboseq-analysis/riboseq-results/analysis.

To run the app (this one works)

rpbp_profile_construction_dashboard riboseq-analysis/config/rpbp-pipeline.yaml -d

but this one keeps loading forever mostly because of IGV...

rpbp_predictions_dashboard riboseq-analysis/config/rpbp-pipeline.yaml -d

These files should be sufficient to address this issue. The large "genome" files are under riboseq-analysis/genome.

eboileau commented 1 year ago

Resolved but see #142