Closed mmelendrez closed 8 years ago
It may be easier at this time to just manually run them
1) Setup Set these variables up(you need to specify absolute paths to reference and input fasta file)
inputfastapath=/path/to/contig.fasta
refpath=/path/to/reference.fasta
outputdir=outputdir
2) Run Copy-paste the rest
mkdir -p ${outputdir}/input
ln -s $inputfastapath $outputdir/input/
inputfastapath=${outputdir}/input/$(basename $inputfastapath)
cp $refpath $outputdir
refpath=${outputdir}/$(basename $refpath)
bwa index $refpath
bwa mem $refpath ${inputfastapath} | samtools view -Su - | samtools sort -o - - > ${outputdir}/out.bam
samtools index ${outputdir}/out.bam
base_caller ${outputdir}/out.bam ${inputfastapath} ${outputdir}/out.bam.vcf
graphsample ${outputdir}/out.bam -od ${outputdir}
vcf_consensus ${outputdir}/out.bam.vcf -i contig -o ${outputdir}/out.bam.consensus.fasta
How this will effect the stages:
mark_lq
may be adjusted to only look at depth and not quality.
So in my continuing efforts to make your life difficult - I'd like to map the contigs obtained from de novo (pathogen discovery pipeline) or really any de novo assembler and map them (and unassembled reads of interest) to a reference.
Problem - they are fasta format and I don't think ngs_mapper can do just plain fasta mapping?
I can toss all reads into a directory for ngs_mapper but it won't accept fasta format for mapping.
I remember we chatted about this when I was attempting to map Rickettsia to the draft 88 contigs from Genbank.
I think this would be a useful feature - scientists often want to take the de novo generated contigs and map back to references to see how they are doing on genomic coverage of their organism of interest.
For a better test case than Rickettsia which is bacterial and messy you can use the file generated during pathogen discovery of Chikungunya - the sample was almost 100% Chik. Project management issue number 9618 (https://vdbpm.org/issues/9618). For this one - since the majority of the sample was Chik I was able to just map the raw reads in the fastq directly to the fasta Chik genome (in the issue). But the pathogen discovery pipeline also generated contigs and whatnot in fasta format (which I can't use in ngs_mapper) - but you can use them if you decide to add this feature. I would suggest rerunning pathogen discovery on this sample to get the contigs though - because this was before we found the issue with iterative phylo where half the results were missing.
Files located at: /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9618
I did chmod 777 on the directory so you should have access to everything in there.