PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
19 stars 21 forks source link

ipdSummary - KeyError: 'DS' #56

Open jgonzalez10 opened 6 years ago

jgonzalez10 commented 6 years ago

Hi!

I've been trying to run ipdSummary on PacBio datasets but it doesn't work. It keeps generating the same error: KeyError: 'DS'

This is a part of the error I'm receiving:

3

I've tried to run it via pbsmrtpipe, but still getting errors.

¿What can I do? I really need to analyze this data via command line.

Thanks! Juliana

jrharting commented 6 years ago

The last line of the traceback is looking for an index. Does your input bam file have a .pbi index associated with it?

jgonzalez10 commented 6 years ago

Hi! Yes it does. I have the .pbi index file in the same directory as the input .bam file.

jrharting commented 6 years ago

Apologies for the delay -- does your reference also have an .fai index associuated with it?
Can you try to run pbvalidate on your aligned bam?

pbvalidate --aligned --index --reference [your ref] [your aligned bam]

jgonzalez10 commented 6 years ago

Hi, "does your reference also have an .fai index associuated with it?" Yes, it does. "Can you try to run pbvalidate on your aligned bam?" I did, I'm getting the following:

screen shot 2018-06-15 at 1 24 15 pm

What does that mean? thanks!

jrharting commented 6 years ago

It looks like the bam you are using does not match PacBio specifications. What mapping program did you use to generate the alignments?

You should be able to use the tool pbbamify along with the original raw subreadset to populate the tags in your bam if you generated it using something other than pbalign/blasr (e.g. ngmlr).

jgonzalez10 commented 6 years ago

We used blasr. However I think pbbamify is a good idea. I do not see it as part of the SMRT tools I have installed. Should it be there? Or do I have to download something else?

mhsieh commented 6 years ago

it's available via bioconda

mjs-macbook-pro:~ mhsieh$ miniconda3/bin/conda create -n pacbiotest pbbam
Solving environment: done
(skipped to save electrons)
Proceed ([y]/n)? y

Downloading and Extracting Packages
(skipped to save electrons)
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > source activate pacbiotest
#
# To deactivate an active environment, use:
# > source deactivate
#

mjs-macbook-pro:~ mhsieh$ (export PATH=miniconda3/bin:$PATH; source activate pacbiotest; pbbamify --help)
Usage: pbbamify [options] <ref.fa> <pb.bam>|<pb.fofn>|<pb.xml>

pbbamify converts an arbitray aligned BAM file to a PacBio-compatible BAM
file.Input BAM file is read from a file or stdin, the raw-reads PacBio BAM is
givenas a parameter, and BAM output is written to stdout.

Options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

  Options:
                        Reference used to align the input.
    --input=STR         The aligned non-PacBio BAM file. If not provided, stdin
                        will be used as input.
    --output=STR        Path to the output BAM file. If not specified, output
                        will be to the stdout.
    --verbose-level=INT
                        Specifies the level of info which will be output
                        produced onstderr. 0 turns all output off, 1 outputs
                        only warnings, while levels 2 and above outputs a status
                        message every 1000000 (2), 100000 (3), 1000 (4), 100
                        (5), 10 (6) and 1 (7) reads.
                        A PacBio BAM file containing raw reads.
jgonzalez10 commented 6 years ago

Ok so I ran pbbamify on my .bam file, it seems to solve the DS error but now I'm getting this new error, any idea what this can be? It's also not much reported online.

screen shot 2018-06-19 at 12 06 08 pm
jrharting commented 6 years ago

Looks like you might not have the sequencing chemistry set properly in your bam header.

If you call samtools view -H [your bam] on your input bam, what are the values for the following variables in the header? BINDINGKIT SEQUENCINGKIT BASECALLERVERSION

Those values should all match one of the mappings defined here: https://github.com/PacificBiosciences/pbcore/blob/master/pbcore/chemistry/resources/mapping.xml

If not, you could go back to your original subreads.bam to find those values, and then re-populate your downstream bam with the correct values using samtools reheader and sed

jgonzalez10 commented 6 years ago

Yes, they match this one:

Sequel® Sequencing Plate Silwet Mapping SequencingChemistry>S/P2-C2/5.0</SequencingChemistry BindingKit>100-862-200</BindingKit SequencingKit>101-309-500</SequencingKit SoftwareVersion>5.0</SoftwareVersion /Mapping

mhsieh commented 6 years ago

related - https://github.com/PacificBiosciences/pbcore/commit/0650c4dd1d26ae8e51cc8da482d5ebba018186a7

jgonzalez10 commented 6 years ago

Sorry, I don't understand what you mean. Could you please elaborate it more?

mhsieh commented 6 years ago

That commit was the change to provide more information for the original KeyError. It seems that you have already passed the original KeyError in the title, so I might not be of too much help here.

My apology that I wasn't be clear about it.

jgonzalez10 commented 6 years ago

Thank you anyway! :)