chloroExtractorTeam / fastg-parser

Parses the FastG files and extracts target assemblies
MIT License
0 stars 0 forks source link

Arrage LSC/IR/SSC in a default order #22

Open greatfireball opened 6 years ago

greatfireball commented 6 years ago

A default orientation for the three sequence parts would be good to ensure reproducibility.

greatfireball commented 6 years ago

@PfaffS can you link/post the default order here within the issue once more?

PfaffS commented 6 years ago

Sure, here we go, default order should be:

<-------------LSC-------------><-------IRB-------><--------SSC--------><-------IRA-------> <psbA(-)--------------rpl22(-)><------rrn23(+)---><ndhF(-)---ndhD(-)--><------rrn23(-)--->

greatfireball commented 6 years ago

Please name and/or link the source @PfaffS

PfaffS commented 6 years ago

Sorry my bad, Michael R. McKain used this on the fast-plast (https://github.com/mrmckain/Fast-Plast), which quotes: "identifies regions from the quadripartite structure of the chloroplast genome, assigns identity, and orders them according to standard convention". Here the Link: https://github.com/mrmckain/Fast-Plast/issues/22

greatfireball commented 6 years ago

thx!

greatfireball commented 6 years ago

Let me quote a little more:

Orientation is determined by looking at the relative orientation of the rpl and rps genes in the LSC, all genes in the SSC, and the rrn rRNAs in the IR. The code orientates the LSC so there are more "-" strand rps and rpl genes than "+", more "-" strand than "+" strand genes in the SSC, and with rrn genes on the "-" strand for the IRA. This works for most lineages (that we know of) in angiosperms. In reality, the SSC is probably in both directions across copies of the plastome in a plant. This is more for convention than anything else.