Closed fwhelan closed 7 years ago
Hi Fiona,
That is confusing as the program has started fine and run up until the first Gibbs step. Can you put the input files somewhere where I can download them from so I can try and reproduce this?
Thanks, Chris
Hi Chris,
I can definitely do that if necessary; however, I've perhaps found an issue further up the pipeline. I'm following the workflow in the readme file for DESMAN and in the step prior to desman
, I'm getting an error which I think might be causing the above:
$ python ~/software/DESMAN-master/desman/Variant_Filter.py cluster_esc3_scgs.freq
/home/whelanfj/software/DESMAN-master/desman/Variant_Filter.py:37: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
new_pvalues = np.zeros(n)
I ignored this because it looked to be a warning, but...
$ head -n 1 cluster_esc3_scgs.freq
Cog,Position,AIA-A,AIA-C,AIA-G,AIA-T,Beef-A,Beef-C,Beef-G,Beef-T,KVLB-A,KVLB-C,KVLB-G,KVLB-T,MAC-A,MAC-C,MAC-G,MAC-T,McKay-A,McKay-C,McKay-G,McKay-T
$ head -n 1 outputsel_var.csv
,Position,Beef-A,Beef-C,Beef-G,Beef-T,MAC-A,MAC-C,MAC-G,MAC-T
..this doesn't seem right.
Hi Fiona,
No that is not right but the issue is not related to the warning from Variant_Filter.py. You have not successfully created the base frequency file. Did bam-readcount work correctly?
Thanks, Chris
The output of bam-readcount looks okay, but let me re-generate and see if I can figure out any differences between the files that worked and the files that didn't.
I'll let you know how it goes- Thanks!
I re-did everything from the initial bwa mapping and samtools to bam-readcount and now I only get 1 of 5 samples in my outputsel_var.csv. Puzzled..
Are the base frequency files the .cnt's from bam-readcount? These seem to work fine though I do get warnings in the .err output:
WARNING: In read HWI-1KL153:74:HA43VADXX:1:1201:7365:6558: Couldn't find single-end mapping quality. Check to see if the SM tag is in BAM.
I get these types of warnings for all samples, albeit the size of the .err is smaller for the sample output to outputsel_var.csv compared to the rest. I will admit that these warnings don't make much sense to me; do you think they're contributing to this loss of samples?
Thanks again! --Fiona
Or maybe I'm misunderstanding the output of Variant_Filter.py
; I assumed that all columns should be present regardless but I'm finding that if I play with the parameters that I get different columns. For example,
$ python ~/software/DESMAN-master/desman/Variant_Filter.py ../cluster_esc3_scgs.freq
$ head -n 1 outputsel_var.csv
,Position,Beef-A,Beef-C,Beef-G,Beef-T
$rm output*
$ python ~/software/DESMAN-master/desman/Variant_Filter.py ../cluster_esc3_scgs.freq -v 0 -m 0 -q 0 -t 0 -sf 0
$ head -n 1 outputsel_var.csv
,Position,AIA-A,AIA-C,AIA-G,AIA-T,Beef-A,Beef-C,Beef-G,Beef-T,McKay-A,McKay-C,McKay-G,McKay-T
... I know setting all of these defaults to 0 is silly, but just trying to understand the output.
Hi Fiona,
The base frequencies are in the .cnt's perhaps have a look at those to make sure they look sensible? The warnings are nothing to worry about. I should look how to get them switched off though as they can take up a lot of disk space.
The reason columns are appearing as you change parameters is due to the -m flag. This is the minimum mean coverage necessary for a sample to be included. It defaults to 5.0 by setting it to zero you include all samples. You probably do not want to do that but -m 1.0 is reasonable in some situations. I would keep the other parametes at there defaults although you may want to try '-p' to turn on the optimisation in the log-ratio test.
Thanks, Chris
Perfect- that makes sense. So we're just back to the original issue then:
$ desman ../variants_yay/outputsel_var.csv -e ../variants_yay/outputtran_df.csv -o cluster_2_0 -r 1000 -i 100 -g 2 -s 0 > cluster_2_0.out
Up and running. Check /home/whelanfj/metagenomics/LA-T1-CE-MG-paper/concoct/ray_assembly_read_cov/concoct-output/DESMAN/run_desman/cluster_2_0/log_file.txt for progress
Traceback (most recent call last):
File "/usr/local/bin/desman", line 5, in <module>
pkg_resources.run_script('desman==0.1dev', 'desman')
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 492, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1350, in run_script
File "/usr/local/lib/python2.7/dist-packages/desman-0.1dev-py2.7-linux-x86_64.egg/EGG-INFO/scripts/desman", line 249, in <module>
main(sys.argv[1:])
File "/usr/local/lib/python2.7/dist-packages/desman-0.1dev-py2.7-linux-x86_64.egg/EGG-INFO/scripts/desman", line 155, in main
haplo_SNP.update()
File "/usr/local/lib/python2.7/dist-packages/desman-0.1dev-py2.7-linux-x86_64.egg/desman/HaploSNP_Sampler.py", line 345, in update
nchange = sampletau.sample_tau(self.tau, self.gamma, self.eta, self.variants)
File "sampletau.pyx", line 38, in sampletau.sample_tau (sampletau/sampletau.c:1510)
ValueError: Buffer dtype mismatch, expected 'long' but got 'double'
$ less cluster_2_0.out
$ less cluster_2_0/log_file.txt
2016-10-31 09:29:34,248:INFO:root:Results created in /home/whelanfj/metagenomics/LA-T1-CE-MG-paper/concoct/ray_assembly_read_cov/concoct-output/DESMAN/run_desman/cluster_2_0
2016-10-31 09:29:34,248:INFO:root:Set fixed seed for random position selection = 238329
2016-10-31 09:29:34,348:INFO:root:Running Desman with 1 samples and 70841 variant positions finding 2 genomes.
2016-10-31 09:29:34,348:INFO:root:Set eta error transition matrix from = <open file '../variants_yay/outputtran_df.csv', mode 'r' at 0x7f77a68ef6f0>
2016-10-31 09:29:34,350:INFO:root:Selected 1000 random variant positions to infer haplotypes from
2016-10-31 09:29:34,373:INFO:root:Set second adjustable random seed = 0
2016-10-31 09:29:34,379:INFO:root:Perform NTF initialisation
2016-10-31 09:29:34,409:INFO:root:NTF Iter 0, div = 468.255157
2016-10-31 09:29:35,117:INFO:root:Start Gibbs sampler burn-in phase
I could reproduce this with making a subset of the first 10 lines of outputsel_var.csv:
,Position,Beef-A,Beef-C,Beef-G,Beef-T
COG0002,39.0,1.0,47.0,0.0,0.0
COG0002,71.0,54.0,1.0,0.0,0.0
COG0002,105.0,2.0,65.0,0.0,0.0
COG0002,120.0,0.0,73.0,0.0,1.0
COG0002,122.0,71.0,0.0,0.0,1.0
COG0002,155.0,0.0,1.0,71.0,0.0
COG0002,168.0,0.0,70.0,0.0,1.0
COG0002,172.0,1.0,0.0,72.0,0.0
COG0002,182.0,74.0,1.0,0.0,0.0
and outputtran_df.csv:
,0,1,2,3
0,0.999985528518,6.21737752731e-06,3.10868876366e-06,5.14541588467e-06
1,3.87447309856e-06,0.999991497684,9.68618274641e-07,3.65922459309e-06
2,4.20981015888e-06,1.09345718413e-06,0.999990541595,4.15513729968e-06
3,2.50252973114e-06,4.24341997889e-06,4.35222561938e-06,0.999988901825
Hi Fiona,
So yes I can what the problem is for some reason your variants file has floats in rather than ints. I can easily fix the code to accept floats and round them to ints but it may be good to figure out why this has happened?
I am more concerned that you only have one sample. Desman is not going to work for this data set.
Thanks, Chris
Hi Chris,
This is just a subset of the eventual samples that I can put through, but I am concerned that 4 of my samples are being dropped.
I seem to be getting slightly different outputs from bam-readcounts each time; I just re-ran it with 2 identical mappings of the KVLB sample (no differences in the _cov.txt files ) but I get differences in the resulting .cnts. Re-running VariantFilter now on one of these new .cnt files gives me 3 samples in my outputsel and no floats:
,Position,Beef-A,Beef-C,Beef-G,Beef-T,KVLB-A,KVLB-C,KVLB-G,KVLB-T,MAC-A,MAC-C,MAC-G,MAC-T
COG0002,419,59,0,0,0,414,3,0,0,297,3,0,0
COG0002,436,59,2,0,0,423,11,0,2,295,6,0,0
COG0002,440,0,0,0,58,0,4,0,438,0,2,0,293
I realize this makes me sound a bit crazy... I'm in the midsts of trying to reproduce this now, but if this turns out to not be related to DESMAN at all, I'm going to feel terrible!
I cannot reproduce this at all so I assume that this was my own error somewhere along the way. I'll reopen if I ever run into this issue again.
Multiple apologies for dragging you through! Thanks so much for your patience and help.
Good morning,
I'm having an issues with
desman
which I can't solve:Much appreciated!