chloroExtractorTeam / chloroExtractor

MIT License
4 stars 8 forks source link

Reference Genome #140

Open nsmt89 opened 5 years ago

nsmt89 commented 5 years ago

How to run chloroExtractor while providing reference genome for assembly? I tried run it with Illumina paired end data it gave fasta file with more than one contigs. The length for those contigs is quite short from supposedly chloroplast length (about 20k).

greatfireball commented 5 years ago

Dear nsmt89,

Unfortunately, chloroExtractor does not support a reference based assembly, due to its underlying assembler spades. In general it would be possible to use another/additional references during the filter step, but the final assembly is limited by spades. Nevertheless, using an individual reference while filtering would avoid to throw away important read pairs . In case you want to try that we will provide you with an how-to . Just let us know.

nsmt89 commented 4 years ago

Hi, Sorry for taking some time to answer you offer. Yes I would like to use individual reference while filtering. Can you guide me how can I do that?

Thank you

iimog commented 4 years ago

Hi @nsmt89 the way to go is creating your own config file e.g. with --create-config and then editing that file, in particular the --ref-cluster entry in the scale_reads.pl call. There you can provide your own reference, it is important to note that this is only used for read scaling not for a reference guided assembly or scaffolding as @greatfireball already noted. It might still increase your result. Another point where you can inject your own reference is the --blastdb option of find_cyclic_graph.pl but this is only for filtering of the assembled contigs. Remember to pass the adjusted config file to chloroExtractor via the --config option.

Two more notes mostly directed at @greatfireball:

  1. It might be possible to provide a reference to spades as --[un]trusted-contigs
  2. We can consider accepting a reference for scaffolding e.g. with RaGOO