Closed lesserof2weevils closed 1 month ago
Hi James,
Inferring ancestral genomes with only one chromosome is possible, but it is usually a sign of an extreme rearrangement history (where syngraph can't infer anything useful), or a mistake in the input files.
The fact that only 1 LMS (linked marker set) is generated for each triplet (and no rearrangements are inferred) suggests that there could be a problem with the input files. For example, if you gave all sequences the same name within each tsv file then these are the kind of results I would expect.
Could you share an example tsv file, or even just the first and last ten lines of one?
Alex
Hi Alex,
Below is an example of the first and last ten lines of one of the .tsv files. They are all tab delimited and all sequences have different names, from the BUSCO full table.tsv. From a synteny analysis we identified extreme rearrangements in these species, with chromosome counts tripling.
Thanks, James
7at33392 Dpon 14899570 14932685
123at33392 Dpon 6609335 6621773
263at33392 Dpon 50257718 50284632
416at33392 Dpon 11714338 11720402
693at33392 Dpon 20000715 20012903
695at33392 Dpon 8381999 8405013
727at33392 Dpon 14550117 14564478
734at33392 Dpon 5485439 5496643
764at33392 Dpon 14207125 14224420
786at33392 Dpon 12709360 12724385
133879at33392 Dpon 457075 457521
133986at33392 Dpon 3685174 3685705
134030at33392 Dpon 19179151 19180241
134292at33392 Dpon 4850650 4851797
134501at33392 Dpon 34527726 34545461
134899at33392 Dpon 4451849 4453009
135025at33392 Dpon 19585697 19588931
135312at33392 Dpon 1735116 1735841
135764at33392 Dpon 5213116 5213812
135985at33392 Dpon 8767213 8767708
137542at33392 Dpon 11543731 11544226
Hi,
The second column in the tsv file should contain the sequence, not the taxon name (which is instead specified in the file name, e.g. genus_species.tsv). So your tsv file should be something more like:
7at33392 chromosome_1 14899570 14932685
123at33392 chromosome_1 6609335 6621773
...
135985at33392 chromosome_20 8767213 8767708
137542at33392 chromosome_20 11543731 11544226
I would suggest grepping these four fields directly from the tsv file generated by BUSCO.
I would expect this to resolve your issue with the infer module, but I'll leave the issue open and you can let me know how it goes.
Cheers,
Alex
Hi Alex,
Thanks for this. Yes, this was indeed the problem!! This has been resolved now
Hi,
Thanks for putting together this cool piece of software. I'm having an issue with the -infer step, where it runs correctly, but no rearrangements are identified. I'm using buscos for the input markers and an inferred tree. It seems that it's only finding a median genome with 1 chromosome? The assemblies I'm testing have chromosome numbers spanning 11 - 39. Any advice would be great!
Dataset I'm testing: https://www.biorxiv.org/content/10.1101/2024.06.25.600716v1.full
Thanks, James