Closed roblehmann closed 7 years ago
We are currently hard-coded for the older blasr, with single-dash command-line arguments. When we switch to the new, the older suite of tools will no longer be usable. Not sure when we'll switch.
@pb-jchin, can you comment on how to install the tool-suite which works today? I'm not familiar with that.
Thanks for the quick reply. Thats the information I'm looking for right now:
a) which blasr version is compatible right now? b) how to get pitchfork to use that version (just hack the makefile to use the git revision?)
I can add a setting that switched to the new blasr style. But I'm not sure the other tools in pitchfork will work. You can try it yourself. Just modify the blasr calls in your unzip.py
.
Btw, the latest FALCON_unzip will require the latest FALCON and the latest pypeFLOW. But you can just keep what you have for now.
I adapted the blasr calls to the new blasr parameter style. I don't think anything changed internally in blasr, no? I should probably head over to the blasr repo and ask them about the error.
@roblehmann, the idea is to decouple blasr and consensus from the core algorithms used in FALCON-Unzip. How well the phasing and consensus works does depend some detail work inside blasr (like all bioinformatics tools depending on any third-party code.) This is not specific to FALCON-Unzip. I think the latest blasr will work as-is beside some option syntax changes.
@pb-jchin @pb-cdunn
I have made some progress on the issue. One problem is, that the samtools sorting during the phasing step is complaining that the read group id of all the reads is not showing up in the header.
In the end no 000000F_sorted.bam is written and the next steps fail. How can I fix this, maybe a problem with the samtools version?
samtools sort tmp_aln.bam -o 000000F_sorted.bam [bam_sort_core] merging from 5 files... [bam_translate] RG tag "996e63df" on read "m140913_050931_42139_c100713652400000001823152404301535_s1_p0/114146/1427_13937" encountered with no corresponding entry in header, tag lost [bam_translate] RG tag "996e63df" on read "m140913_050931_42139_c100713652400000001823152404301535_s1_p0/66551/0_22202" encountered with no corresponding entry in header, tag lost
I think everything is consistent if you use the most recent version of unzip/falcon/pypeflow/pitchfork.
sorry for reopen this, but I am confused ,why is randome for --hitPolicy :--hitPolicy random --concordant --randomSeed 1 --useQuality why not is randombest for --hitPolicy ? Because in my thought, random mean maybe the best or the not the best.
thanks~ Si
I compiled the current blasr version with bam support, adapted the current FALCON_UNZIP version for correct blasr calls, and went successfully through the ecoli2 testcase until the polishing via quiver. Unfortunately, blasr dies when trying to open the bam file
blasr /home/lehmanr/lib/FALCON-integrate/FALCON-examples/run/ecoli2/4-quiver/000000F_002/000000F_002.bam /home/lehmanr/lib/FALCON-integrate/FALCON-examples/run/ecoli2/4-quiver/000000F_002/000000F_002_ref.fa --out /tmp/tmpfCwyxx/Gu5Frm.bam --bam --bestn 1 --minMatch 12 --nproc 24 --minSubreadLength 50 --minAlnLength 50 --minPctSimilarity 70 --minPctAccuracy 75 --hitPolicy random --concordant --randomSeed 1 --useQuality
with this result:
Do you have any idea whats going on? Thanks in advance