PombertLab / SYNY

The SYNY pipeline investigates synteny between species by reconstructing protein clusters from gene pairs.
MIT License
29 stars 4 forks source link

combined barplot missing #5

Closed marade closed 2 months ago

marade commented 2 months ago

Greetings. The SYNY home page shows a combined barplot output:

https://github.com/PombertLab/SYNY?tab=readme-ov-file#barplots

But when I run SYNY and I look in BARPLOTS/PNG or BARPLOTS/SVG directories I only see individual comparisons and no combined image. Is there a hidden flag to activate that?

Pombert-JF commented 2 months ago

I'm not sure I understand your issue correctly. The barplots shown are for eukaryotes. SYNY expects that all contigs from a given genome are concatenated together into a single GBFF file (this is the norm in GenBank), i.e. genome_1.gbff, genome_2.gbff and so forth. If your eukaryote genome is in individual GBFF files, you can probably just concatenate them into a single file with something like cat *.gbff > genome_1.gbff.

Are you trying to compare bacterial genomes? If you are trying to compare bacterial genomes, then yes you'll only get one (or a few) bars in the barplots; for the bacterial chromosome and for the plasmids (if any).

if you provide me with your command lines and an example of your input files (or a link to them), I'll have a better idea of the issue you are describing.

marade commented 2 months ago

Hi. These are small regions of bacteria genomes. They look fairly good in the Circos combined plots, but wish I could get the same thing with barplots. Is there any way to compare all of them in similar fashion?

Pombert-JF commented 2 months ago

Not at the moment. Concatenation into a single barplot would likely require a rewrite of the backend unless we "cheat" by concatenating the PAF files first. Then paf_to_barplot.py should work without a fuss on the concatenated PAF files. Might be the easiest way forward. Color coding could be problematic however. I can implement a color coding by clusters option (like for the Circos plots) but I'm not sure how useful that would be.

I'll look into it tomorrow. You can try it yourself by concatenating the PAF files with the same reference together, e.g. cat *_vs_ref.mmap.paf > ref_concatenated.paf, then running paf_to_barplot.py --paf ref_concatenated.paf --fasta *.fasta --outdir TEST (or something like that) on the concatenated PAF file.

UPDATE: just tested the above and it chokes due to a regex match. Will need to update paf_to_barplot.py as well.

Otherwise, you could always concatenate the barplots you want from the SVG outputs with Adobe Illustrator or Inkscape (https://inkscape.org/).

marade commented 2 months ago

Thanks, that's helpful. I'll let you do the updates and test again, but it sounds like I'll be able to make it work one way or another.

Pombert-JF commented 2 months ago

As discussed, I pushed a new version that can produces concatenated barplots. Just need to invoke --bpmode cat (or --bpmode all). The default is set to --bpmode pair since most people will want the pairwise plots.

Example of a concatenated barplot produced from small eukaryote genomes (4 queries x 1 reference): all_vs_Ec50602 gap_0 barplot 19 2x10 8 Spectral

I also implemented a --bclusters option to color clusters by alternating colors in the barplots. The colors are not related within or between contigs, they are just used to highlight collinear chunks. If you want to use that, you'll likely want to use a qualitative palette instead of the default sequential one (possible palettes). The image below was generated with --bclusters and --palette Paired

Ec50602_vs_OpFIF10 mmap barplot 19 2x10 8 clus

Hope these additions help. Will close this issue as resolved but let me know if you encounter any issue.

marade commented 2 months ago

Thanks, this works well, very much as anticipated. One issue I am noticing is that for sequences that are 100% identical, there is a small gap at the end of the plots. Are you seeing that too?

all_vs_Xj1 gap_0 barplot 19 2x10 8 blue

Pombert-JF commented 2 months ago

No, could very well be a bug. Can you send me those files file so that I can debug it?

marade commented 2 months ago

Sure, sent via e-mail...

Pombert-JF commented 2 months ago

Yup bug. Forgot to add a line.strip() to remove newlines, so it was adding them as extra length. Pushed the fix to github.

marade commented 2 months ago

Yes, looks good. Thanks again.