Closed skillcoyne closed 9 years ago
Hi and thanks for trying out BreaKmer. Sorry for the issues in running the software. From the error it looks like the gene annotation file that was input is not formatted as expected. The strand column is in the column that it expects the 'end' of the gene to be. The annotation file should look like the following:
0 NM_032291 chr1 + 66999824 67210768 67000041 67208778 25 66999824,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67 133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755, 67000051,67091593,67098777,6710169 8,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,6721 0768, 0 SGIP1 cmpl cmpl 0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1, 1 NM_032785 chr1 - 48998526 50489626 48999844 50489468 14 48998526,49000561,49005313,49052675,49056504,49100164,49119008,49128823,49 332862,49511255,49711441,50162984,50317067,50489434, 48999965,49000588,49005410,49052838,49056657,49100276,49119123,49128913,49332902,49511472,49711536,50163109,50317190,50489626, 0AGBL4 cmpl cmpl 2,2,1,0,0,2,1,1,0,2,0,1,1,0, 1 NM_018090 chr1 + 16767166 16786584 16767256 16785385 8 16767166,16770126,16774364,16774554,16775587,16778332,16782312,16785336, 16767348,16770227,16774469,16774636,16775696,16778510,16782388,16786584, 0 NECAP2 cmpl cmpl 0,2,1,1,2,0,1,2, 1 NM_052998 chr1 + 33546713 33585995 33547850 33585783 12 33546713,33546988,33547201,33547778,33549554,33557650,33558882,33560148,33 562307,33563667,33583502,33585644, 33546895,33547109,33547413,33547955,33549728,33557823,33559017,33560314,33562470,33563780,33583717,33585995, 0 ADC cmpl cmpl -1 ,-1,-1,0,0,0,2,2,0,1,0,2, ...
If you paste the first couple of lines of your annotation file, I can see what needs to be modified.
Ok, I had used the refseq.bed file as I thought I read that was the required annotation. I downloaded refGene.txt from UCSC and was able to run the example so all is well. Thanks for the quick response!
I'd like to run BreaKmer on my tumor data sets but I'm currently unable to run the example data. I'm not a python developer either so am not clear what the issue is. Any help would be welcome!
After setting up the breakmer example config with the following:
analysis_name=example targets_bed_file=//tools/BreaKmer/example_data/genes.bed
sample_bam_file=//tools/BreaKmer/example_data/B2M.bam
analysis_dir=//tools/BreaKmer/example_data/example
reference_data_dir=//tools/BreaKmer/example_data/data/ref
cutadapt=//tools/cutadapt/cutadapt-1.8.1/bin/cutadapt
cutadapt_config_file=//tools/BreaKmer/example_data/cutadapt.cfg
jellyfish=//tools/Jellyfish/jellyfish-2.2.0i/jellyfish
blat=//tools/BLAT/blat
gfclient=//tools/BLAT/gfClient
gfserver=//tools/BLAT/gfServer
fatotwobit=//tools/faToTwoBit
reference_fasta=//tools/BreaKmer/ref/all.fa
gene_annotation_file=//tools/BreaKmer/refseq.bed
repeat_mask_file=//tools/BreaKmer/repeatmask.bed
kmer_size=15
And running "python breakmer.py example_data/breakmer.cfg"
I get the following errors in stdout, but no errors in the log file.
Traceback (most recent call last): File "breakmer.py", line 103, in
r = runner(config_d)
File "/mnt/gaiagpfs/users/homedirs/skillcoyne/tools/BreaKmer/sv_processor.py", line 100, in init
self.params = params(config_d)
File "/mnt/gaiagpfs/users/homedirs/skillcoyne/tools/BreaKmer/utils.py", line 706, in init
self.set_params()
File "/mnt/gaiagpfs/users/homedirs/skillcoyne/tools/BreaKmer/utils.py", line 796, in set_params
self.gene_annotations.add_genes(self.opts['gene_annotation_file'])
File "/mnt/gaiagpfs/users/homedirs/skillcoyne/tools/BreaKmer/utils.py", line 970, in add_genes
end = int(linesplit[5])
ValueError: invalid literal for int() with base 10: '+'