genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
232 stars 71 forks source link

Bam index files for HG002 not working? #7

Closed joshuak94 closed 4 years ago

joshuak94 commented 4 years ago

Hello. I'm trying to download a subset of data from HG002 and parents. I'm using the command samtools view -bh -o HG002_20.bam ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.hs37d5.2x250.bam 20, which should save chr 20 in a bam file for me. However, I get the following error: [E::idx_test_and_fetch] Error reading "ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.hs37d5.2x250.bam.bai" [1] 6864 segmentation fault (core dumped) samtools view -bh -o HG002_20.bam 20

I have the same issue when I try and download the reads which used GRCh38 as reference.

Note that the above command works fine for the mother and father's reads.

Could it be that when the BAM files were re-uploaded in 2019, they were not re-indexed?

I've tried this with samtools 1.10 and samtools 1.9 and both give errors.

chunlinxiao commented 4 years ago

I have no problem to slice chr20 from grch37 or grch38 bams using local path or remote path:

samtools view -bh -o HG002_20.bam //HG002.hs37d5.2x250.bam 20

samtools view -bh -o HG002_20_b38.bam ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.GRCh38.2x250.bam chr20

samtools view HG002_20.bam |more D00360:95:H2YWMBCXX:1:2113:12579:57631 163 20 60001 70 68S182M = 60086 335 AGTAAACTATCCCACTTTGAACAGAATTTTTAAGAGAAAAAACTGAAAGTTAATAGAGAGGTGACTCAGATCCAGAG GTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAGGTGGTAAGGAAGGAGAGAGTGA AGGAACTGCCAGGTGACACACTCCCACCATGGACCTCTGGGATCCTAGCTTTAAGAGATCCCATCACCCATATGAACGTT DDDDD HIIIIIIHGIHIIIIIIHGHIIIHIHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIHIIIIIIIIIHHIIIIIIHIIIIIIIIGII IIIFHHHIIHIIIHIIIIGIIIIHIIIIIHIIIIHIHHHIIGHIIIGHHIFGHECCEGHHHIHGIEHHHHIEHHH?CEHHHHIHIFGII?GHH HHHF?CHFEHEHHHCHH?EHGFEFHFHEHF?@@--F-6-8FH-A@A?@G--F?@-AHA6 PG:Z:novoalign AS:i:425

samtools view HG002_20_b38.bam |more D00360:95:H2YWMBCXX:1:2102:15861:100303 163 chr20 60001 70 49S199M2S = 60231 478 GAGGAAGAGCAGCCAGTTTCTGCTGCTGATGATCAGGAGGTGGAGAAATTGTTCAGTCGGGCAGGGAGTGGGAAT AGACAAGACCACAAGCAGCTTGGTGCCTCTGAAAGGGAGAGGGGTGGAGGGGAGACTAGAGAGGTGGGTAGGAATACTGGATTCCACTGAC CACGTGCTGGATGTCATGCTTAGCCCTCCTGCTCTGTGCCAGGTTAGGCACCTGGTGTTTTACATAGATTATATTACATTCTCT DD@ DDHIIHIHHIIIIIIIIIIGIIIIHIIIIIHIHG?ECHIIHFCEGHFHIHFEHHHHIGHIIIIHH@HHHGEHHG<GHEHIIEHHE<GH1CH IIHHGHGIIHCHHHH1D<CC@EHHH@EHEEHHI<E0DHCDDHFHHHIGCHHHEDHF/CGHIFHG@FEHHH?GFHEHHHHHHIHAECHHHIG IEEHHHHIHIHHGHEHH.FGHH.EFFEHHHCEHFEC6F@@@F?@6F6---6@?6@@---@6@G-6 PG:Z:novoalign AS: i:328 UQ:i:328 NM:i:1 MD:Z:183T15 PQ:i:330 SM:i:0 AM:i:0

chunlinxiao commented 4 years ago

samtools view -bh -o HG002_20.bam HG002.hs37d5.2x250.bam 20

samtools view -bh -o HG002_20_b38.bam ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.GRCh38.2x250.bam chr20

joshuak94 commented 4 years ago

Hm strange. Copying and pasting your code gives me this error:

samtools view -bh -o HG002_20_b38.bam ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.GRCh38.2x250.bam chr20
[E::idx_test_and_fetch] Error reading "ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.GRCh38.2x250.bam.bai"
[1]    9869 segmentation fault (core dumped)  samtools view -bh -o HG002_20_b38.bam  chr20

Which samtools version are you using?

chunlinxiao commented 4 years ago

I'm using 1.3.1

samtools/1.3.1/bin/samtools

joshuak94 commented 4 years ago

Very strange, I just downloaded and built samtools 1.3.1 and it seems to be able to download the region fine as well. I guess I'll mention it in the samtools repository.

I wonder though why the parents worked with 1.9 and 1.10 with no issue.

Thank you!

chunlinxiao commented 4 years ago

just tried with samtools v1.9 - no problem to slice:

/usr/local/samtools/1.9/bin/samtools view -bh -o HG002_20_2.bam HG002.hs37d5.2x250.bam 20

/usr/local/samtools/1.9/bin/samtools view -bh -o HG002_20_b38_2.bam ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/HG002.GRCh38.2x250.bam chr20

/usr/local/samtools/1.9/bin/samtools view HG002_20_2.bam |more D00360:95:H2YWMBCXX:1:2113:12579:57631 163 20 60001 70 68S182M = 60086 335 AGTAAACTATCCCACTTTGAACAGAATTTTTAAGAGAAAAAACTGAAAGTTAATAGAGAGGTGACTCAGATCCAGAG GTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAGGTGGTAAGGAAGGAGAGAGTGA AGGAACTGCCAGGTGACACACTCCCACCATGGACCTCTGGGATCCTAGCTTTAAGAGATCCCATCACCCATATGAACGTT DDDDD HIIIIIIHGIHIIIIIIHGHIIIHIHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIHIIIIIIIIIHHIIIIIIHIIIIIIIIGII IIIFHHHIIHIIIHIIIIGIIIIHIIIIIHIIIIHIHHHIIGHIIIGHHIFGHECCEGHHHIHGIEHHHHIEHHH?CEHHHHIHIFGII?GHH HHHF?CHFEHEHHHCHH?EHGFEFHFHEHF?@@--F-6-8FH-A@A?@G--F?@-AHA6 PG:Z:novoalign AS:i:425

/usr/local/samtools/1.9/bin/samtools view HG002_20_b38_2.bam |more D00360:95:H2YWMBCXX:1:2102:15861:100303 163 chr20 60001 70 49S199M2S = 60231 478 GAGGAAGAGCAGCCAGTTTCTGCTGCTGATGATCAGGAGGTGGAGAAATTGTTCAGTCGGGCAGGGAGTGGGAAT AGACAAGACCACAAGCAGCTTGGTGCCTCTGAAAGGGAGAGGGGTGGAGGGGAGACTAGAGAGGTGGGTAGGAATACTGGATTCCACTGAC CACGTGCTGGATGTCATGCTTAGCCCTCCTGCTCTGTGCCAGGTTAGGCACCTGGTGTTTTACATAGATTATATTACATTCTCT DD@ DDHIIHIHHIIIIIIIIIIGIIIIHIIIIIHIHG?ECHIIHFCEGHFHIHFEHHHHIGHIIIIHH@HHHGEHHG<GHEHIIEHHE<GH1CH IIHHGHGIIHCHHHH1D<CC@EHHH@EHEEHHI<E0DHCDDHFHHHIGCHHHEDHF/CGHIFHG@FEHHH?GFHEHHHHHHIHAECHHHIG IEEHHHHIHIHHGHEHH.FGHH.EFFEHHHCEHFEC6F@@@F?@6F6---6@?6@@---@6@G-6 PG:Z:novoalign AS: i:328 UQ:i:328 NM:i:1 MD:Z:183T15 PQ:i:330 SM:i:0 AM:i:0 D00360:96:H2YLYBCXX:1:2105:5916:51581 163 chr20 60001 70 3S247M = 600 92 340 AATTGTTCAGTCGGGCAGGGAGTGGGAATAGACAAGACCACAAGCAGCTTGGTGCCTCTGAAAGGGAGAGGGGTG GAGGGGAGACTAGAGAGGTGGGTAGGAATACTGGATTCCACTGACCACGTGCTGGATGTCATGCTTAGCCCTCCTGCTCTGTGCCAGGTTA GGCACCTGGTGTTTTACATATATTATATTACATTCTATTAACTACAACTCCATAGCCATCCTTTCCTCTCCATTCCATTTCTCT DCD DDIIIIIIIIIIHHIIIHIHHHHIIHIIIHIIIIIIIIIIIIIIIIIIIHFFIIHIIIHHIIIHIHHIIHGHHHHHHIHIHHHGIHIIIIH IIIIIIIIIIIIIHIGIIIIIIIIIIIHHGHHIIIICHHFHIGIIHHGIIIIIIIEHIFHCFHIFHHIHFHHHIIHIIGIIHHGH.BGC6F AHHI6.B?CGHHEHHH..8AB.6.88.AHEHH.AFC@.8@AH@HGHH@E-GH?G-F--6@6---- PG:Z:novoalign AS: i:97 UQ:i:97 NM:i:4 MD:Z:203C0A0G13A27 PQ:i:115 SM:i:70 AM:i:12

joshuak94 commented 4 years ago

Hmmm alright, it must be something wrong with my configuration. Either way, samtools 1.3 worked so thank you!