Closed ericcrandall closed 6 years ago
The workflow is present in the repository, but not entirely integrated to the anaconda package, yet, because of some issues i've been (slowly) trying to work through. So, you may need to download seqtk and add it's location to ~/.phyluce.conf
like:
[seqtk]
seqtk:$CONDA/bin/seqtk
OK - it's the ~/.phyluce.conf
that is throwing me. I can't find it in my home directory (even searching for invisible files), nor when I search the entire anaconda2 directory.
Thanks again for your help.
you should be able to create the file. in your $HOME
.
OK - I made that file with the suggested text, and it is still not finding seqtk. I have placed seqtk executables into ~/anaconda2/envs/phyluce/bin
and ~/anaconda2/pkgs/phyluce-1.5.0-py27_0/bin
Is there some other place I need to put it, or another config file I can edit?
Sorry for the hassle.
What do the contents of the file look like (it should be at ~/.phyluce.conf
)? If you used the tilda (~
) in the path to seqtk, go ahead and expand the paths (or use an environment variable like seqtk:$HOME/anaconda2/envs/phyluce/bin/seqtk
).
Contents are just what you gave to me above.
(phyluce) M053-303-26932:~ cran5048$ pwd
/Users/cran5048
(phyluce) M053-303-26932:~ cran5048$ cat .phyluce.conf
[seqtk]
seqtk:$CONDA/bin/seqtk
I tried changing it to
[seqtk]
seqtk:/Users/cran5048/anaconda2/envs/phyluce/bin/seqtk
to no avail.
Specific error is this:
(phyluce) M053-303-26932:phyluce cran5048$ phyluce_snp_phase_uces --config ./sebova/sebova_phasing_conf.txt --bams ./sebova/mapping --output ./sebova/phased_reads
Traceback (most recent call last):
File "/Users/cran5048/anaconda2/envs/phyluce/bin/phyluce_snp_phase_uces", line 26, in <module>
from phyluce import seqtk
ImportError: cannot import name seqtk
Ok. Gotcha. This is a different issue than the binary for seqtk being found. I think what's happened is that you have the package for phyluce, but that's an older version. You added in the "binaries" for the needed functions from github by copying the files as you mentioned (i glanced over this part). But, you're not getting all of the bits, because there are also a number of other, phyluce modules that go along with those scripts, e.g. changes to phyluce/phyluce/samtools
as well as the creation of phyluce/phyluce/seqtk
. You don't have these files because they're not in the conda
version that's currently available. I also realize I need to update what's on conda, but there are some blocking changes that I haven't entirely figured out how to work around (e.g. Trinity not being buildable on osx).
Your best bet to get all of the phyluce pieces that you need is to checkout, using git, the entire phyluce repository somewhere in your $HOME
. You will probably need to alter your $PYTHONPATH to get this to work, e.g.:
export PYTHONPATH=$HOME/git/phyluce
Then, you should be able to run git using
python ~/path/to/phyluce/bin/snps/phyluce_snp_phase_uces
OK - thanks that worked. Yeah, I was having difficulty installing Trinity to my OSX - glad its not just me.
But now there is a new issue. Let me know if these aren't issues you are quite ready to tackle yet. I understand this is still under development. Error looks like:
(phyluce) M053-303-26932:phyluce cran5048$ python ~/github/phyluce/bin/snps/phyluce_snp_phase_uces \
> --config ./sebova/sebova_phasing_conf.txt \
> --bams ./sebova/mapping \
> --output ./sebova/phased_reads
2018-04-10 13:07:44,723 - phyluce_snp_phase_uces - INFO - ================ Starting phyluce_snp_phase_uces ================
2018-04-10 13:07:44,723 - phyluce_snp_phase_uces - INFO - Version: git e3e2f09
2018-04-10 13:07:44,723 - phyluce_snp_phase_uces - INFO - Argument --bams: /Users/cran5048/BaseSpace/UCE_Fish_2/phyluce/sebova/mapping
2018-04-10 13:07:44,723 - phyluce_snp_phase_uces - INFO - Argument --config: /Users/cran5048/BaseSpace/UCE_Fish_2/phyluce/sebova/sebova_phasing_conf.txt
2018-04-10 13:07:44,724 - phyluce_snp_phase_uces - INFO - Argument --conservative: False
2018-04-10 13:07:44,724 - phyluce_snp_phase_uces - INFO - Argument --cores: 1
2018-04-10 13:07:44,724 - phyluce_snp_phase_uces - INFO - Argument --log_path: None
2018-04-10 13:07:44,724 - phyluce_snp_phase_uces - INFO - Argument --output: /Users/cran5048/BaseSpace/UCE_Fish_2/phyluce/sebova/phased_reads
2018-04-10 13:07:44,724 - phyluce_snp_phase_uces - INFO - Argument --verbosity: INFO
2018-04-10 13:07:44,724 - phyluce_snp_phase_uces - INFO - ================ Starting phyluce_snp_phase_uces ================
2018-04-10 13:07:44,745 - phyluce_snp_phase_uces - INFO - Getting input filenames and creating output directories
2018-04-10 13:07:44,867 - phyluce_snp_phase_uces - INFO - ----------------- Processing sebova_sbc_R010966 -----------------
2018-04-10 13:07:44,867 - phyluce_snp_phase_uces - INFO - Phasing BAM file for sebova_sbc_R010966
2018-04-10 13:07:45,986 - phyluce_snp_phase_uces - INFO - Sorting BAM for sebova_sbc_R010966
2018-04-10 13:07:46,017 - phyluce_snp_phase_uces - INFO - Sorting BAM for sebova_sbc_R010966
2018-04-10 13:07:46,025 - phyluce_snp_phase_uces - INFO - Creating REF/ALT allele FASTQ file 0
Traceback (most recent call last):
File "/Users/cran5048/github/phyluce/bin/snps/phyluce_snp_phase_uces", line 289, in <module>
main()
File "/Users/cran5048/github/phyluce/bin/snps/phyluce_snp_phase_uces", line 271, in main
fastas = phase_bam(log, individual, reference, bam, args.output)
File "/Users/cran5048/github/phyluce/bin/snps/phyluce_snp_phase_uces", line 99, in phase_bam
make_cons_0 = samtools.call(log, sample, phasing_out_dir, reference, sort_phased_0, "0")
File "/Users/cran5048/github/phyluce/phyluce/samtools.py", line 78, in call
get_user_path("samtools", "bcftools"),
File "/Users/cran5048/github/phyluce/phyluce/pth.py", line 29, in get_user_path
pth = config.get(program, binary)
File "/Users/cran5048/anaconda2/envs/phyluce/lib/python2.7/ConfigParser.py", line 618, in get
raise NoOptionError(option, section)
ConfigParser.NoOptionError: No option 'bcftools' in section: 'samtools'
Yes, your config is now also missing this option. Here's what my ~/.phyluce.conf
looks like (and most of the binaries you need should already be in place by installing the phyluce dependencies).
Also not a problem to help get working (and I apologize for difficulty). I need to rejigger for a number of things that have changed about conda
and also to start using biopython for dependencies, but I don't have lots of time to do that right now (and bugaboos like missing trinity
will inevitable break the workflow for some folks - so i'm also working on implementing another, hopefully better assembly option).
[abyss]
abyss:$CONDA/bin/ABYSS
abyss-pe:$CONDA/bin/abyss-pe
[bowtie]
bowtie:$CONDA/bin/bowtie
[bwa]
bwa:$CONDA/bin/bwa
[gblocks]
gblocks:$CONDA/bin/gblocks
[trimal]
trimal:$HOME/git/trimal/source/trimal
[java]
executable:java
mem:-Xmx46g
jar:$CONDA/jar
gatk:GenomeAnalysisTKLite.jar
[lastz]
lastz:$CONDA/bin/lastz
[mafft]
mafft:$CONDA/bin/mafft
[muscle]
muscle:$CONDA/bin/muscle
[raxml]
raxmlHPC-SSE3:$CONDA/bin/raxmlHPC-SSE3
raxmlHPC-PTHREADS-SSE3:$CONDA/bin/raxmlHPC-PTHREADS-SSE3
[samtools]
samtools:$CONDA/bin/samtools
bcftools:$CONDA/bin/bcftools
vcfutils:$CONDA/bin/vcfutils.pl
[seqtk]
seqtk:$HOME/bin/seqtk
[trinity]
#trinity:$HOME/bin/trinity
trinity:$HOME/src/trinityrnaseq_r2013-02-25/Trinity.pl
#max_memory:46G
kmer_coverage:2
jellyfish_memory:36G
[velvet]
velvetg:$CONDA/bin/velvetg
velveth:$CONDA/bin/velveth
#----------------
# Advanced
#----------------
[headers]
trinity:comp\d+_c\d+_seq\d+|c\d+_g\d+_i\d+|TR\d+\|c\d+_g\d+_i\d+
velvet:node_\d+
OK - that worked. Thanks for your help and all your work on the pipeline!
One last question or questions - what exactly is being phased? Did the pipeline call SNPs at any point, and if so where? My output joined_allele_sequences_all_samples.fasta
is a small subset of the loci that were mapped, and most of the phased haplotypes do not have any variant sites.
If I wanted to call SNPs myself, I could start with the .bam files that are produced at the end of the mapping right? And the reference fasta would be the one produced by phyluce_assembly_get_fastas_from_match_counts
?
Error generally fixed in v1.6.2 and build that will be up on bioconda shortly. To answer @ericcrandall - the SNPs are called jointly in the samtools phase part of the code, then output as two alleles. As to why you have so few phased loci, I am not sure. As I work on general fixes (and porting to python3), I'll try to generate a variety of test cases for this workflow.
Oh, and to call SNPs yourself, you use the BAM files generated during read-mapping (e.g. the bwa_multiple_align). Then, you can pick your SNP calling approach (e.g. samtools or GATK, etc.). The reference fasta would be the assemblies you choose to map your reads against.
I do not get how the reads are called if the coverage is low. Let's say the coverage is 2X, each read with a different mutation for the same site, what would be the nucleotide in the resulting site of the loci?
There are several descriptions on google regarding how samtools phases - please search there.
I am quite interested in phasing the SNPs in my dataset per Andermann et al. 2018 but as I've followed the instructions I found that
phyluce_snp_bwa_multiple_align
andphyluce_snp_phase_uces
were missing from the /bin folder. I copied these from GitHub and got the first one to work, butphylum_snp_phase_uces
is trying to import seqtk and not finding it.Has this portion of the workflow been abandoned or deprecated?