chloroExtractorTeam / chloroExtractor

MIT License
4 stars 8 forks source link

What does the 'cds-nr98-core.fa' mean #142

Open lychen83 opened 4 years ago

lychen83 commented 4 years ago

Dear all,

In the default config file ptx.cfg. There is one line --ref-cluster %bin%/../data/cds-nr98-core.fa Is it mean using the cds-nr98-core.fa as the reference for reads extracting or cp assembly? It looks like the file cds-nr98-cor.fa includes some cp gene sequences.

I have extracted cp reads from total RNA-seq reads. I just want to assembly the cp genes, and use them for phylogenetic analysis. Can chloroExtractor generate oriented contigs with these extracted cp reads?

I appreciate any help.

Best regards, Lingyun

iimog commented 4 years ago

Hi Lingyun,

the file cds-nr98-core.fa contains indeed 24 chloroplast genes from various species: accD,atpA,atpB,atpI,cemA,matK,ndhB,ndhK,petA,petB,psaA,psaB,psbA,psbB,psbC,psbD,rbcL,rpl2,rpoA,rpoB,rpoC1,rps12,rps2,rps4. These are used in the scale_reads step to stop screening reads when a certain amount of chloroplast reads are found. They are also used to detect chloroplast contigs in the final assembly (if it is not a single circular chloroplast anyway). They are not used as references in the sense of a reference guided assembly. For your specific task: it might be possible to use chloroExtractor to get contigs for the separate genes but this is clearly outside the scope (scope is de-novo assembly of the chloroplast genome from genomic reads) and I would not expect good results. Feel free to try. Additionally, you can use any part of chloroExtractor that you think might be useful including the cds-nr98-core.fa. Maybe @greatfireball has additional ideas?

Best regards, Markus