Kinggerm / GetOrganelle

Organelle Genome Assembly Toolkit (Chloroplast/Mitocondrial/ITS)
GNU General Public License v3.0
273 stars 51 forks source link

Is it possible to standardize SSC orientation? #79

Open RvV1979 opened 3 years ago

RvV1979 commented 3 years ago

In the wiki you write: "You can pick the configuration of which the SSC region is in the same direction with most of your data or outgroup, which makes downstream annotation/analysis easier".

Is there a way to ensure that complete*.1.path_sequence.fasta output files always have the same orientation? Or, alternatively, to programatically detect which output file has the desired orientation?

Because I now only see the SSC orientation after post-hoc gene annotation and manually checking all assemblies and inverting SSC where necessary is rather laborious...

Thanks

Kinggerm commented 3 years ago

Currently, if you are doing assembly for closely-related taxa, it will usually be conserved for 1 or 2. In other case, you can either

1) use mauve to align and visualize a bunch of samples, and mark the one that doesn't fit in the same order; or 2) use mafft to align both options (1 and 2) against another genome with the desired order and simply calculate the p-distance of from the alignment.

The latter solution can be easily programmable, though not an ideal and robust one (e.g. there's other rearrangement). Hope these help.

In considering of all cases (rearrangement etc), a safe solution for all needs better design. I understand, ssc reordering may be a pain for some. I will leave this issue open and put making the design on the schedule. But it may not be very quick.

RvV1979 commented 3 years ago

Thanks for the prompt reply and advice. I am doing assemblies of closely related plants (same species) but orientation is not always conserved for 1 and 2.

I can see how it may be challenging to design a robust solution that works well for all possible rearrangements and reorientations. But reading your faq perhaps you could consider selecting the orf-richest or orf-least strand of the SSC as 1 and 2? Just an idea...

For now, I have opted to use a simple grep command to select the output file having ndhF in forward orientation and that does seem to work for me.

Kinggerm commented 3 years ago

Using orf is definitely doable - I thought I have done that until I checked the code and just found I had not implemented that. Now I know why when I said it should be "conserved", but it is actually not. I will find a time to fix it after these busy days. Thanks for the feedback.