Open RvV1979 opened 3 years ago
Currently, if you are doing assembly for closely-related taxa, it will usually be conserved for 1 or 2. In other case, you can either
1) use mauve
to align and visualize a bunch of samples, and mark the one that doesn't fit in the same order; or
2) use mafft
to align both options (1 and 2) against another genome with the desired order and simply calculate the p-distance of from the alignment.
The latter solution can be easily programmable, though not an ideal and robust one (e.g. there's other rearrangement). Hope these help.
In considering of all cases (rearrangement etc), a safe solution for all needs better design. I understand, ssc reordering may be a pain for some. I will leave this issue open and put making the design on the schedule. But it may not be very quick.
Thanks for the prompt reply and advice. I am doing assemblies of closely related plants (same species) but orientation is not always conserved for 1 and 2.
I can see how it may be challenging to design a robust solution that works well for all possible rearrangements and reorientations. But reading your faq perhaps you could consider selecting the orf-richest or orf-least strand of the SSC as 1 and 2? Just an idea...
For now, I have opted to use a simple grep command to select the output file having ndhF in forward orientation and that does seem to work for me.
Using orf is definitely doable - I thought I have done that until I checked the code and just found I had not implemented that. Now I know why when I said it should be "conserved", but it is actually not. I will find a time to fix it after these busy days. Thanks for the feedback.
In the wiki you write: "You can pick the configuration of which the SSC region is in the same direction with most of your data or outgroup, which makes downstream annotation/analysis easier".
Is there a way to ensure that complete*.1.path_sequence.fasta output files always have the same orientation? Or, alternatively, to programatically detect which output file has the desired orientation?
Because I now only see the SSC orientation after post-hoc gene annotation and manually checking all assemblies and inverting SSC where necessary is rather laborious...
Thanks