Closed jinhys closed 1 year ago
Hi @jinhys, When you call the visualization script you need to isolate a single call. Can you try creating a new breakpoint file with just the entry you'd like to plot, excluding the header?
Let me know if that doesn't work. Thanks again for reporting this issue.
Hi @zeeev,
Thanks for checking this issue.
I did try re-running the visualization script by using a shorter version of the BED file as you mentioned (i.e., included 4 entry lines of the full breakpoint BED file output - without the header). Unfortunately, I'm getting different errors this time. Can you advise on this front? (The error messages are attached below.)
Traceback (most recent call last):
File "~/tool/miniconda3/envs/pacbio/lib/python3.7/site-packages/matplotlib/style/core.py", line 114, in use
rc = rc_params_from_file(style, use_default_template=False)
File "~/tool/miniconda3/envs/pacbio/lib/python3.7/site-packages/matplotlib/__init__.py", line 984, in rc_params_from_file
config_from_file = _rc_params_in_file(fname, fail_on_error)
File "~/tool/miniconda3/envs/pacbio/lib/python3.7/site-packages/matplotlib/__init__.py", line 914, in _rc_params_in_file
with _open_file_or_url(fname) as fd:
File "~/tool/miniconda3/envs/pacbio/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "~/tool/miniconda3/envs/pacbio/lib/python3.7/site-packages/matplotlib/__init__.py", line 900, in _open_file_or_url
with open(fname, encoding=encoding) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'clean'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "~/tool/pbfusion/scripts/visualize_fusion.py", line 238, in <module>
main(args)
File "~/tool/pbfusion/scripts/visualize_fusion.py", line 234, in main
plot_fusion(ref_genes, alignments, breakpoints, args.output)
File "~/tool/pbfusion/scripts/visualize_fusion.py", line 143, in plot_fusion
plt.style.use('clean')
File "~/tool/miniconda3/envs/pacbio/lib/python3.7/site-packages/matplotlib/style/core.py", line 120, in use
"available styles".format(style))
OSError: 'clean' not found in the style library and input is not a valid URL or path; see `style.available` for list of available styles
Please let me know if you need any further details. Thanks!
Hi @jinhys, you'll need to copy the mplstyle file to your matplotlib config dir. Something like this should work: mkdir -p ~/.config/matplotlib/stylelib && cp clean.mplstyle ~/.config/matplotlib/stylelib && cp ~/.config/matplotlib/stylelib/clean.mplstyle ~/.config/matplotlib/stylelib/clean
.
I try to have both clean.mplstyle
and clean
in the stylelib because sometimes mpl can be picky.
Another note, the visualization script needs the input bed file to only have a single entry instead of 4. Having the header lines is fine since those get skipped
Hi @velociroger-pb,
Thanks a lot for your comment. I've fixed all those errors and finally got the PNG files!
A few suggestions for further development on the visualization, it would be great if we can also add annotations in the output - such as isoform IDs
and types
(e.g., NIC, NNC, FSM, ISM, etc.) - and save them in PDF format.
Thanks again for your help!
@jinhys,
Glad we could resolve the issue for you. Please feel free to re-open the ticket if needed. Hopefully, you found an interesting fusion.
Best,
Zev
Hi,
I did manage to get the PNG output files previously - Unfortunately, when I tried to generate the same type of output for another fusion with a different combination of chromosomes
, I'm getting this ValueError again:
Traceback (most recent call last):
File "~/tool/pbfusion/scripts/visualize_fusion.py", line 266, in <module>
main(args)
File "~/tool/pbfusion/scripts/visualize_fusion.py", line 262, in main
plot_fusion(ref_genes, alignments, breakpoints, args.output)
File "~/tool/pbfusion/scripts/visualize_fusion.py", line 138, in plot_fusion
gene1_start = min([x[0][1][0][0] for x in list(alignments.values()) if x[0][0] == breakpoints[0][0]])
ValueError: min() arg is an empty sequence
(FYI, I've used the most recent version of the scripts published on GitHub.)
Can you advise on this error? Thanks in advance!
To me it looks like you aren't left with any alignments after filtering. Can you post the contents of the bed file you're trying to visualize? This script will only work with fusions that involve two genes, but you can modify the bed file to force it to only look at the two primary genes.
Hi @velociroger-pb,
I've attached the 1-line content of the two individual BEDPE files for the visualization:
an example which causes the ValueError mentioned above
chr17 81996845 81996846 chrX 49028728 49028729 BP665 MEDIUM + - READ_COUNT=43;BP_COUNT=43;SA=43,0;BMD=0.465116,0,0.930233;GENE_NAMES=ASPSCR1,TFE3;GENE_NAME_COUNTS=287,251;GENE_IDS=ENSG00000169696.16,ENSG00000068323.17;GENE_ID_COUNTS=287,251;EXONIC=43,43;INTRONIC=0,0;SRCF=15:43;SINKF=15:43;RT=FUSION:43 READS=m64011_221002_221225/753/ccs,m64011_221002_221225/9700219/ccs,m64011_221002_221225/13370638/ccs,m64011_221002_221225/30279250/ccs,m64011_221002_221225/33489545/ccs,m64011_221002_221225/44762815/ccs,m64011_221002_221225/58655980/ccs,m64011_221002_221225/65014418/ccs,m64011_221002_221225/68355721/ccs,m64011_221002_221225/73336608/ccs,m64011_221002_221225/79561655/ccs,m64011_221002_221225/84871992/ccs,m64011_221002_221225/100139028/ccs,m64011_221002_221225/115475948/ccs,m64011_221002_221225/121505328/ccs,m64011_221002_221225/128714267/ccs,m64011_221002_221225/129763260/ccs,m64011_221002_221225/136380825/ccs,m64011_221002_221225/136577787/ccs,m64011_221002_221225/144245732/ccs,m64011_221002_221225/149424083/ccs,m64011_221002_221225/155847037/ccs,m64011_221002_221225/177407407/ccs,m64011_221002_221225/11796742/ccs,m64011_221002_221225/31129683/ccs,m64011_221002_221225/52756930/ccs,m64011_221002_221225/70124624/ccs,m64011_221002_221225/84346910/ccs,m64011_221002_221225/85524942/ccs,m64011_221002_221225/143329543/ccs,m64011_221002_221225/24774818/ccs,m64011_221002_221225/83755593/ccs,m64011_221002_221225/100862570/ccs,m64011_221002_221225/108397667/ccs,m64011_221002_221225/120127760/ccs,m64011_221002_221225/164693472/ccs,m64011_221002_221225/176816615/ccs,m64011_221002_221225/177146357/ccs,m64011_221002_221225/40764636/ccs,m64011_221002_221225/96799349/ccs,m64011_221002_221225/126485532/ccs,m64011_221002_221225/75498126/ccs,m64011_221002_221225/153092816/ccs
an example which generates an empty visualization output (i.e., this BEDPE file does generate a PNG file but the content of the PNG file shows no visualized reads.)
chr18 26090609 26090610 chr5 176346066 176346067 BP718 MEDIUM - - READ_COUNT=2;BP_COUNT=2;SA=0,2;BMD=0,0,0;GENE_NAMES=SS18,KIAA1191;GENE_NAME_COUNTS=22,2;GENE_IDS=ENSG00000141380.14,ENSG00000122203.15;GENE_ID_COUNTS=22,2;EXONIC=2,2;INTRONIC=0,0;SRCF=11:2;SINKF=11:2;RT=FUSION:2 READS=m54086U_221001_224915/64619292/ccs,m54086U_221001_224915/79364686/ccs
Thanks!
I would print out the alignment dict if you know how. To me it looks like everything is getting filtered, which is why you would end up with nothing when looking for the leftmost position in the line that it errors on.
Hi @velociroger-pb,
Thanks for your reply. I understand your point on the filtering, but I wonder whether we can adjust the filtering criteria to be less stringent when running the visualize_fusion.py
script. (Or do I have to adjust the parameters from re-running the pbfusion
script?)
Also, can you advise on printing out the alignment dict that you mentioned?
Many thanks!
So on line 132 you should currently have:
alignments = {k: sorted(v) for (k, v) in alignments.items() if len(v) == 2}
at the same indentation level, you'll want to have:
print(alignments)
alignments = {k: sorted(v) for (k, v) in alignments.items() if len(v) == 2}
print(alignments)
The filtering at this step is to ensure that each read only has two alignments (primary and supplementary), which will get plotted. All of the pbfusion
filtering options are more about refining which breakpoints are reported and less about cleaning individual reads/alignments for visualization.
Hi @velociroger-pb,
Thanks for the code lines! - I will check my output files again with the printing out step, and reach out again if I need further help.
Thanks a lot :)
Hi @velociroger-pb,
Thanks for the code lines! - I will check my output files again with the printing out step, and reach out again if I need further help.
Thanks a lot :)
Hi @jinhys ! were you able to fix your problem ? I'm having the same issue at the moment ...
Hi @meryyr - Sorry for the belated reply.
As far as I remember, it was not a technical error and it had more to do with the embedded read/alignment filtering criteria as mentioned above (which I included as a quote below). I think it would be more helpful for you to reach out to @velociroger-pb / @zeeev or other PacBio Team members directly.
The filtering at this step is to ensure that each read only has two alignments (primary and supplementary), which will get plotted. All of the
pbfusion
filtering options are more about refining which breakpoints are reported and less about cleaning individual reads/alignments for visualization.
Operating system Which operating system and version are you using?
Package name Which package / tool is causing the problem? Which version are you using, use
tool --version
. Have you updated to the latest versionconda update package
? Have you updated the complete env by runningconda update --all
? Have you ensured that your channel priorities are set up according to the bioconda recommendations at https://bioconda.github.io/#set-up-channels?Conda environment What is the result of
conda list
? (Try to paste that between triple backticks.)Describe the bug A clear and concise description of what the bug is.
Error message Paste the error message / stack.
To Reproduce Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!
Expected behavior A clear and concise description of what you expected to happen.