PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

LAshow to read names #175

Closed dgordon562 closed 9 years ago

dgordon562 commented 9 years ago

Hi, Jason and Chris,

using LAshow -c, I get some alignments like this:

2,880,776 8,851,350 n [ 0..13,078] x [10,484..23,199](14 trace pts)

                       7697
A          ----------+======>       dif/(len1+len2) = 3842/(13078+12715) = 29.79%
B ========+---------->
    10484 

How can I find the names of the reads being aligned? (Some method using DBshow?)

Thanks! David

pb-jchin commented 9 years ago

I think 2,880,776 and 8,851,350 are Daligner's internal ID. If you use DBshow -n to dump the read ids, you can find the mapping. (You have to double check the off-by-one thing. You should read DBshow.c source code to know what it means fully.)

pb-cdunn commented 9 years ago

You can try the synth0 example in FALCON-examples. It's small enough to analyze and understand completely.

dgordon562 commented 9 years ago

Thanks to both of you.

I've investigated all options to DBshow and can't get any of them to give the internal id. This is what it looks like with DBshow -n:

m140913_050931_42139_c100713652400000001823152404301535_s1_p0/54496/0_8229 RQ=0.854 m140913_050931_42139_c100713652400000001823152404301535_s1_p0/54497/2328_12385 RQ=0.828

The RQ doesn't look anything like the internal id and I don't see any other number...

Perhaps there is some other command than DBshow that will relate read names with internal IDs?

pb-cdunn commented 9 years ago

On our mk-flow branch of DAZZ_DB, we have a -M flag, which we use to generate a map-file of read-id to fasta (iirc). I could include that code in our master branch, if you're interested. I've been meaning to do something like that anyway.

pb-jchin commented 9 years ago

@dgordon562 the internal id is the implicit row number of the file. I discussed this with Gene, he thought it was not necessary to have a sperate field for the redundant information.

dgordon562 commented 9 years ago

Thanks!

Does this mean that I dump all of DBshow -n into a file, and the order of the sequence in the file is the internal ID?

pb-cdunn commented 9 years ago

Is it redundant when some lines have been filtered?

pb-jchin commented 9 years ago

yes. you just have to make sure if you use the "trim" option. (By default, you will be OK. if not sure, check with the code related to the "trim" option in DBsplit, DBshow.)

pb-jchin commented 9 years ago

@pb-cdunn no, the line number is the redunance. I prefer to have the internal id to original id mapping explicitly, but Gene is the designer.

dgordon562 commented 9 years ago

This appears to work. When I do an alignment with another alignment program of the supposed sequences (using your method of corresponding), the clipped regions are approximately the same.

Yes! Thank you!