brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
143 stars 54 forks source link

tabulate uses RGSM, but doesn't escape "/" #26

Closed JohnLonginotto closed 7 years ago

JohnLonginotto commented 8 years ago

It seems that when running tabulate like so:

/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py tabulate --map-q 60 --bissnp /ex/BisSNP-0.82.2.jar --prefix Mm01.WGBS -t 10 --reference /ref/mm10.fa --trim 3,1 /media/john/DATA1/WGBS/44_Mm01_WEAd_C2_WGBS_E_no_dupes.bam

I get the following output:

    java -Xmx24g -jar /ex/BisSNP-0.82.2.jar \
        -R /ref/mm10.fa \
        -I /media/john/DATA1/WGBS/44_Mm01_WEAd_C2_WGBS_E_no_dupes.bam \
        -T BisulfiteGenotyper \
        --trim_5_end_bp 3 \
        --trim_3_end_bp 1 \
        -vfn1 Mm01.WGBS.meth.vcf -vfn2 Mm01.WGBS.snp.vcf \
        --non_directional_protocol \
        -mbq 12 \
        -minConv 0 \
        -toCoverage 1000 \
        -mmq 60   \
        -nt 10
0   T*  C
0   A*  G
0   A*  G
0   C*  T
0   T*  C
0   C*  T
0   C*  T
0   C*  T
0   T*  C
0   G*  A
0   T*  C
0   T*  C
0   C*  T
0   G*  A
0   A*  G
0   T*  C
0   C*  T
Mm01.WGBS.meth.vcf
Traceback (most recent call last):
  File "/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py", line 601, in <module>
    main(sys.argv[1:])
  File "/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py", line 554, in main
    sys.exit(tabulate_main(args[1:]))
  File "/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py", line 506, in tabulate_main
    .format(prefix=a.prefix, sample=sample), "w")
IOError: [Errno 2] No such file or directory: 'Mm01.WGBS.C57BL/6J.meth.bed'

I think the error is about there not being a directory called Mm01.WGBS.C57BL - but it only wants to write to that directory because my RGIDs look like this:


@RG ID:HWI-ST552.C2J56ACXX.1.NA SM:C57BL/6J LB:Mm01.WGBS    PL:illumina CN:Essen    DS:Circadian Day
@RG ID:HWI-ST552.C2J56ACXX.2.NA SM:C57BL/6J LB:Mm01.WGBS    PL:illumina CN:Essen    DS:Circadian Day
@RG ID:HWI-ST552.C2J56ACXX.3.NA SM:C57BL/6J LB:Mm01.WGBS    PL:illumina CN:Essen    DS:Circadian Day
@RG ID:HWI-ST552.D2FWTACXX.1.CAGATCA    SM:C57BL/6J LB:Mm01.WGBS    PL:illumina CN:Essen    DS:Circadian Day

and I think the "/" i the sample name is not being escaped/deleted when writing out the new file. Also, i'm not sure using data from the RG is a good idea regardless, since 1 BAM file can contain many RGIDs (it just so happens that in my file they all have the same SM/LB) I did however get the following outputs:

Mm01.WGBS.meth.vcf
Mm01.WGBS.meth.vcf.MethySummarizeList.txt
Mm01.WGBS.snp.vcf

All the best! :)

brentp commented 8 years ago

the tabulate tool falls under this where I'll gladly accept fixes, but I'm not going to support it.

if you do want to make a PR, it looks like your diagnosis is exactly correct.