Closed ArthurSilver closed 1 year ago
Hello!
If you let me know the goals of your study I can maybe make better suggestions to help you get there.
Is your question related to this paper? Simakov, Oleg, et al. "Deeply conserved synteny resolves early events in vertebrate evolution." Nature Ecology & Evolution 4.6 (2020): 820-830. The chordate LGs have already been characterized in this manuscript and there are 17.
If you're trying to recreate these results, then look at the file odp_nway_rbh/step2-groupby/*.rbh.groupby
. I am guessing that there will be around 17 LGs. The methods that odp
uses to determine ALGs are not the same as the methods used in Simakov et al. 2020, so I do not know the exact number. The file odp_nway_rbh/step3-unwrap/*.filt.unwrapped.rbh
contains the linkage groups that have a FDR <=0.05, and these are unwrapped such that each line represents one 3-way reciprocal best hit between the species (think of it like an OrthoGroup).
Hello!
If you let me know the goals of your study I can maybe make better suggestions to help you get there.
Is your question related to this paper? Simakov, Oleg, et al. "Deeply conserved synteny resolves early events in vertebrate evolution." Nature Ecology & Evolution 4.6 (2020): 820-830. The chordate LGs have already been characterized in this manuscript and there are 17.
If you're trying to recreate these results, then look at the file
odp_nway_rbh/step2-groupby/*.rbh.groupby
. I am guessing that there will be around 17 LGs. The methods thatodp
uses to determine ALGs are not the same as the methods used in Simakov et al. 2020, so I do not know the exact number. The fileodp_nway_rbh/step3-unwrap/*.filt.unwrapped.rbh
contains the linkage groups that have a FDR <=0.05, and these are unwrapped such that each line represents one 3-way reciprocal best hit between the species (think of it like an OrthoGroup).
Thanks for the reply, I did try to reproduce the results in the Simakov's paper, albeit utilizing a novel B.floridae assembly instead(Three amphioxus reference genomes reveal gene and chromosome evolution of chordates). Does every row in the file odp_nway_rbh/step2-groupby/*.rbh.filt.groupby
correspond to a linkage group? There are 86 rows in groupby file, and if that's the case then the result is significantly greater than 17 LGs.
I am also interested in creating a colored Oxford dotplot similar to Simakov et al. 2020, as well as a ribbon plot like the one presented in Simakov et al. 2022. Would you be able to provide guidance on how to achieve these visualizations?
Looking forward to your reply.
Thanks for clarifying - each of the CLGs identified in Simakov et al 2020 were defined based off of the identity of specific B. floridae chromosomes compared to the other species you mentioned. From Supplementary Data Table 6 from that paper, this is how the CLGs were defined:
Supplementary Table 6. Assignment of Amphioxus chromosomes to inferred ancestral chordate linkage groups (and base pair positions of the assignments) | |
---|---|
CLGA | BFL_1 |
CLGB | BFL_10 + BFL_16 + BFL_18 |
CLGC | BFL_2.1 + BFL_3.1 |
CLGD | BFL_6 |
CLGE | BFL_5 |
CLGF | BFL_7 |
CLGG | BFL_11 |
CLGH | BFL_13 |
CLGI | BFL_4.2 |
CLGJ | BFL_2.2 + BFL_17 |
CLGK | BFL_9 |
CLGL | BFL_15 |
CLGM | BFL_8 |
CLGN | BFL_12 |
CLGO | BFL_4.1 + BFL_19 |
CLGP | BFL_14 |
CLGQ | BFL_3.2 |
where the segments of amphioxus chromosomes 2, 3, and 4 are defined as | |
BFL_2.1 | BFL_2(1-10,906,430) + BFL2(15,371,497-23,880,387) + BFL2(25,960,386-end) |
BFL_2.2 | BFL_2(10,906,431-15,371,496) + BFL_2(23,880,388-25,960,385) |
BFL_3.1 | BFL_3(1-16,935,122) |
BFL_3(16,935,123-end) | |
BFL_4.1 | BFL_4(1-11,474,496) |
BFL_4.2 | BFL_4(11,474,497-end) |
The way that odp_nway_rbh
currently determines linkage groups is by looking for combinations of chromosomes that appear more often than expected by random chance (given the gene quantity and chromosomes of the 3 species being compared). The count
column in odp_nway_rbh/step2-groupby/*.rbh.filt.groupby
is the number of 3-way orthologs shared between that specific chromosome combination, and the alpha
column is the false discovery rate of finding a linkage group that large.
If you compare the largest 30 or so rows in this spreadsheet you will see that they likely correspond to the CLGs described in Simakov et al 2020. The ALGs defined in the odp
spreadsheet will, by definition, contain fewer orthologs than the Simakov et al 2020 set because of odp
's strict method of finding linkage groups that I described above.
If you wish to make Oxford Dot Plots, then you can follow these instructions. Right now the only curated ALG sets that will automatically color your plots is a more strict subset of the the BCnS ALGs from Simakov et al 2022, and the pre-metazoan LGs from Schultz et al 2023. I recommend just looking at the plots generated from this pipeline that will be located in odp/step2-figures/synteny_coloredby_*
, and you can correlate the LGs there with what your question might be.
If you would like to make ribbon diagrams, the only way supported currently with odp
is with the rbh_to_subway script, which works by feeding it a list of results from the normal odp
pipeline I mentioned in the paragraph above, and specifying the species order of the figure.
If you would like to define your own set of LGs as you would find in the LG_db directory, that is currently undocumented. If you would like there to be a release of the Simakov et al 2020 CLGs that can be used for plotting, let us know.
Hello, so in the last few pushes I have included an implementation of the chordate linkage groups (warning: it will take a long time to make
and to run). I have also included instructions on how to make a ribbon diagram.
odp
odp
I think this addresses the issues you brought up. I'll leave this issue open for about two weeks, then I will close it. Let me know in this thread if you run into other related problems.
Hello, so in the last few pushes I have included an implementation of the chordate linkage groups (warning: it will take a long time to
make
and to run). I have also included instructions on how to make a ribbon diagram.
- First, I recommend doing a git pull to update the repo.
- Set up the CLGs to work with
odp
- Run
odp
- Make a ribbon diagram
I think this addresses the issues you brought up. I'll leave this issue open for about two weeks, then I will close it. Let me know in this thread if you run into other related problems.
Thanks a lot and I'll give it a try.
Hello, so in the last few pushes I have included an implementation of the chordate linkage groups (warning: it will take a long time to
make
and to run). I have also included instructions on how to make a ribbon diagram.
- First, I recommend doing a git pull to update the repo.
- Set up the CLGs to work with
odp
- Run
odp
- Make a ribbon diagram
I think this addresses the issues you brought up. I'll leave this issue open for about two weeks, then I will close it. Let me know in this thread if you run into other related problems.
Hello!
I was able to successfully run odp
with CLGs but failed in running odp_rbh_to_ribbon
Here's error message:
Error in rule make_plot:
jobid: 0
input: /public/home/project/amphioxus/synteny/odp/odp/step2-figures/synteny_coloredby_CLG_v1.0/Al_Lo_xy_reciprocal_best_hits.coloredby_CLG_v1.0.plotted.rbh, sp_to_chr_to_size.tsv
output: output.pdf
RuleException:
NameError in file /public/home/biosoft/odp/scripts/odp_rbh_to_ribbon, line 376:
name 'sns' is not defined
File "/public/home/biosoft/odp/scripts/odp_rbh_to_ribbon", line 651, in __rule_make_plot
File "/public/home/biosoft/odp/scripts/odp_rbh_to_ribbon", line 376, in ribbon_plot
File "/public/home/miniconda3/envs/odp/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
and my configure file
# There are several options for how to sort the chromosomes.
# More information is available in the config file.
chr_sort_order: optimal-chr-or
# Tells the program whether to plot the non-significant interactions.
plot_all: True
# Specifies which species will be plotted from the top-to-bottom.
species_order:
- Al
- Lo
rbh_directory: /public/home/project/amphioxus/synteny/odp/odp/step2-figures/synteny_coloredby_CLG_v1.0/
# Only two species are shown here for brevity,
# but please include the species information for all the species you wish to plot.
species:
Al:
proteins: /public/home/project/amphioxus/synteny/odp/prot/Alref.prot.fa
chrom: /public/home/project/amphioxus/synteny/odp/chrom/Alref.chrom
genome: /public/home/project/amphioxus/Refer/al_canu_ref_patch_HIC.tidy.masked.rename.fasta
minscafsize: 20000000 # Only plots scaffolds that are 1 Mbp or longer
Lo:
proteins: /public/home/project/amphioxus/synteny/odp/prot/Lo.prot.fa
chrom: /public/home/project/amphioxus/synteny/odp/chrom/Lo.chrom
genome: /public/home/project/amphioxus/synteny/odp/genome/Lepisosteus_oculatus.LepOcu1.dna_sm.toplevel.fa
minscafsize: 1000000 # Only plots scaffolds that are 1 Mbp or larger
I don't think there's any issue with my configuration or input files, so I'm not sure how to resolve the problem.
And another small problem in odp
's output
If the tables were too long, the legend would overlap with them.
Looking forward to your reply!
If you go to the install directory for odp
and execute the command git pull
that should update the code. I made a fix where it should work now if you run odp_rbh_to_ribbon
again.
Yes, the problem is that the PDF is a set size, but sometimes the table is too long. This is an open issue (https://github.com/conchoecia/odp/issues/14) if you have some code edits to fix this problem. I recommend opening the PDF in Adobe Illustrator or another vector image editor (Inkscape is one), to get the table if you need it for something.
If you go to the install directory for
odp
and execute the commandgit pull
that should update the code. I made a fix where it should work now if you runodp_rbh_to_ribbon
again.Yes, the problem is that the PDF is a set size, but sometimes the table is too long. This is an open issue (#14) if you have some code edits to fix this problem. I recommend opening the PDF in Adobe Illustrator or another vector image editor (Inkscape is one), to get the table if you need it for something.
Thanks again, it finally worked! I really appreciate your help.
Glad it worked for you and thanks for posting about the issues! I will work on making the experience more streamlined. Stay tuned for updates.
Hi,
odp is a valuable tool for synteny analysis, although I have some questions in finding ancestral linkage groups. Specifically, I am seeking to identify ALGs in three distinct species(lancelet, chicken and sptted gar), I have successfully executed the odp_nway_rbh scripts, and as a result, obtained the rbh file in the step3-unwrap folder. However, I am uncertain about the subsequent steps required to determine the quantity of ALGs. Does a fixed combination of chromosomes between three species represent an ALG? (like Al Chr5, Gg NC_006092.5, Lo LG7) And how can I get the Oxford dotplot and ribbon plot from this rbh file?
Looking forward to your reply!·