PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

blasr 5.3.56ec927 with current FALCON_UNZIP fails in quiver: "read group ID not found" #52

Closed roblehmann closed 7 years ago

roblehmann commented 8 years ago

I compiled the current blasr version with bam support, adapted the current FALCON_UNZIP version for correct blasr calls, and went successfully through the ecoli2 testcase until the polishing via quiver. Unfortunately, blasr dies when trying to open the bam file

blasr /home/lehmanr/lib/FALCON-integrate/FALCON-examples/run/ecoli2/4-quiver/000000F_002/000000F_002.bam /home/lehmanr/lib/FALCON-integrate/FALCON-examples/run/ecoli2/4-quiver/000000F_002/000000F_002_ref.fa --out /tmp/tmpfCwyxx/Gu5Frm.bam --bam --bestn 1 --minMatch 12 --nproc 24 --minSubreadLength 50 --minAlnLength 50 --minPctSimilarity 70 --minPctAccuracy 75 --hitPolicy random --concordant --randomSeed 1 --useQuality

with this result:

terminate called after throwing an instance of 'std::runtime_error'
  what():  read group ID not found
[1]    27762 abort (core dumped)  blasr   --out /tmp/tmpfCwyxx/Gu5Frm.bam

Do you have any idea whats going on? Thanks in advance

pb-cdunn commented 8 years ago

We are currently hard-coded for the older blasr, with single-dash command-line arguments. When we switch to the new, the older suite of tools will no longer be usable. Not sure when we'll switch.

@pb-jchin, can you comment on how to install the tool-suite which works today? I'm not familiar with that.

roblehmann commented 8 years ago

Thanks for the quick reply. Thats the information I'm looking for right now:

a) which blasr version is compatible right now? b) how to get pitchfork to use that version (just hack the makefile to use the git revision?)

pb-cdunn commented 8 years ago

I can add a setting that switched to the new blasr style. But I'm not sure the other tools in pitchfork will work. You can try it yourself. Just modify the blasr calls in your unzip.py.

Btw, the latest FALCON_unzip will require the latest FALCON and the latest pypeFLOW. But you can just keep what you have for now.

roblehmann commented 8 years ago

I adapted the blasr calls to the new blasr parameter style. I don't think anything changed internally in blasr, no? I should probably head over to the blasr repo and ask them about the error.

pb-jchin commented 8 years ago

@roblehmann, the idea is to decouple blasr and consensus from the core algorithms used in FALCON-Unzip. How well the phasing and consensus works does depend some detail work inside blasr (like all bioinformatics tools depending on any third-party code.) This is not specific to FALCON-Unzip. I think the latest blasr will work as-is beside some option syntax changes.

roblehmann commented 7 years ago

@pb-jchin @pb-cdunn
I have made some progress on the issue. One problem is, that the samtools sorting during the phasing step is complaining that the read group id of all the reads is not showing up in the header. In the end no 000000F_sorted.bam is written and the next steps fail. How can I fix this, maybe a problem with the samtools version?

samtools sort tmp_aln.bam -o 000000F_sorted.bam [bam_sort_core] merging from 5 files... [bam_translate] RG tag "996e63df" on read "m140913_050931_42139_c100713652400000001823152404301535_s1_p0/114146/1427_13937" encountered with no corresponding entry in header, tag lost [bam_translate] RG tag "996e63df" on read "m140913_050931_42139_c100713652400000001823152404301535_s1_p0/66551/0_22202" encountered with no corresponding entry in header, tag lost

pb-cdunn commented 7 years ago

I think everything is consistent if you use the most recent version of unzip/falcon/pypeflow/pitchfork.

orangeSi commented 6 years ago

sorry for reopen this, but I am confused ,why is randome for --hitPolicy :--hitPolicy random --concordant --randomSeed 1 --useQuality why not is randombest for --hitPolicy ? Because in my thought, random mean maybe the best or the not the best.

thanks~ Si