Closed dokempf closed 1 year ago
There are follow-up errors to this one - so the question might as well be: What do I need to do to run the predictions app on the c-elegans
test data?
Indeed... the index does not exist for the test data... but in general it will or should... so maybe I can actually add the index to the test data, but meanwhile, under the conda environment, do
samtools faidx /scratch/global_tmp/pytest-of-eboileau/pytest-82/data0/c-elegans-chrI-example/input/WBcel235.chrI.fa
or whatever the path is to your data. You should then have at that location a file named WBcel235.chrI.fa.fai.
Then you need to re-run the prep script (summarize-rpbp-predictions with --overwrite
), before launching the app.
I must improve error/missing data handling in the app, as this particular case results, as far as I can see, from summarize-rpbp-predictions completing without being able to generate the complete data (actually, if you check the logs, there should be a statement such as "Continuing, but file is missing!").
There still seems to be some step missing or some hardcoded behaviour is not suited for the test dataset:
Traceback (most recent call last):
File "/home/dkempf/miniconda3/envs/rpbp-dev/bin/rpbp_predictions_dashboard", line 33, in <module>
sys.exit(load_entry_point('rpbp', 'console_scripts', 'rpbp_predictions_dashboard')())
File "/home/dkempf/miniconda3/envs/rpbp-dev/bin/rpbp_predictions_dashboard", line 25, in importlib_load_entry_point
return next(matches).load()
File "/home/dkempf/miniconda3/envs/rpbp-dev/lib/python3.9/importlib/metadata.py", line 86, in load
module = import_module(match.group('module'))
File "/home/dkempf/miniconda3/envs/rpbp-dev/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/home/dkempf/rp-bp/src/rpbp/analysis/rpbp_predictions/dashboard/rpbp_predictions_dashboard.py", line 590, in <module>
"data": circos_graph_data[f"histogram_{orf_type_default}"],
KeyError: 'histogram_CDS'
Did you follow https://github.com/dieterich-lab/rp-bp/issues/136#issuecomment-1378426789 ?
For the test data, the prep script must be called with specific arguments to work
summarize-rpbp-predictions /scratch/global_tmp/pytest-of-eboileau/pytest-82/data0/c-elegans-chrI-example/c-elegans-test.yaml --circos-bin-width 100000 --circos-show-chroms I
If called more than once w/o --overwrite
, new files will NOT be generated. Logging should report this. If data is not properly generated, this will not work in the app.
I agree, as I said above, this should be integrated with the test data or at least documented (WIP)... and error/missing data handling must be improved.
Please let me know if this solves the problem...
There seem to be some missing parameters for the manual samtools
invocation you mentioned above. The generated index file is only 20 Bytes with above invocation. I have a potential solution to #137, but cannot properly test it currently.
Yes, the index file for the c-elegans will be very small (20B is what I have), as the test data has only one chromosome... so this is fine. But I will add it to the test data anyway.
We need a larger data for #137. I will send you a link.
I have prepared a larger test dataset for #137
Activate your environment.
wget https://data.dieterichlab.org/s/d9XNab4i4MFbw3S/download -O bigger-test-data.zip && unzip bigger-test-data.zip && cd bigger-test-data && chmod +x setup; ./setup
Pre-computed analyses are under riboseq-analysis/riboseq-results/analysis.
To run the app (this one works)
rpbp_profile_construction_dashboard riboseq-analysis/config/rpbp-pipeline.yaml -d
but this one keeps loading forever mostly because of IGV...
rpbp_predictions_dashboard riboseq-analysis/config/rpbp-pipeline.yaml -d
These files should be sufficient to address this issue. The large "genome" files are under riboseq-analysis/genome.
Resolved but see #142
With the
c-elegans
test data, therpbp_predictions_dashboard
app does not execute. The reason is that it looks for an output fileWBcel235.chrI.fa.fai.txt
, where onlyWBcel235.chrI.fa.txt
exists. The triggering line of code is https://github.com/dieterich-lab/rp-bp/blob/dev-ssciwr/src/rpbp/analysis/rpbp_predictions/dashboard/rpbp_predictions_dashboard.py#L323