Illumina / REViewer

A tool for visualizing alignments of reads in regions containing tandem repeats
GNU General Public License v3.0
73 stars 14 forks source link

[error] Failed to extract reads from the specified region #47

Closed Jolleboll closed 2 years ago

Jolleboll commented 2 years ago

Hello, I keep getting this error, I've tried with three different loci, the error remains the same.

"LocusResults": {
    "AFF2": {
      "AlleleCount": 2,
      "Coverage": 31.54054054054054,
      "FragmentLength": 352,
      "LocusId": "AFF2",
      "ReadLength": 150,
      "Variants": {
        "AFF2": {
          "CountsOfFlankingReads": "()",
          "CountsOfInrepeatReads": "()",
          "CountsOfSpanningReads": "()",
          "Genotype": "0/0",
          "GenotypeConfidenceInterval": "0-714/0-714",
          "ReferenceRegion": "X:147582151-147582211",
          "RepeatUnit": "GCC",
          "VariantId": "AFF2",
          "VariantSubtype": "Repeat",
          "VariantType": "Repeat"
        }
      }
    },
"AR": {
      "AlleleCount": 2,
      "Coverage": 41.91891891891891,
      "FragmentLength": 350,
      "LocusId": "AR",
      "ReadLength": 150,
      "Variants": {
        "AR": {
          "CountsOfFlankingReads": "(1, 2), (2, 2), (3, 3), (4, 1), (5, 1), (6, 2), (8, 6), (9, 1), (10, 3), (15, 3), (17, 1), (18, 3), (19, 5), (20, 2), (23, 1), (24, 1), (26, 1), (28, 1)",
          "CountsOfInrepeatReads": "()",
          "CountsOfSpanningReads": "(27, 16), (28, 9)",
          "Genotype": "27/28",
          "GenotypeConfidenceInterval": "27-27/28-28",
          "ReferenceRegion": "X:66765158-66765227",
          "RepeatUnit": "GCA",
          "VariantId": "AR",
          "VariantSubtype": "Repeat",
          "VariantType": "Repeat"
        }
      }
    },

I thought maybe the problem was that there are no InrepeatReads, but the problem remains if only using an entry where all three read categories have members, e.g:

"LocusResults": {
    "RFC1": {
      "AlleleCount": 2,
      "Coverage": 41.513513513513516,
      "FragmentLength": 351,
      "LocusId": "RFC1",
      "ReadLength": 150,
      "Variants": {
        "RFC1": {
          "CountsOfFlankingReads": "(1, 2), (2, 2), (3, 3), (4, 2), (5, 2), (7, 2), (8, 5), (10, 2), (11, 1), (15, 1), (16, 1), (17, 2), (20, 1), (21, 1), (22, 1), (25, 1), (29, 4)",
          "CountsOfInrepeatReads": "(30, 15), (31, 1), (32, 1)",
          "CountsOfSpanningReads": "(10, 11), (15, 1)",
          "Genotype": "10/65",
          "GenotypeConfidenceInterval": "10-10/53-102",
          "ReferenceRegion": "4:39350044-39350099",
          "RepeatUnit": "AARRG",
          "VariantId": "RFC1",
          "VariantSubtype": "Repeat",
          "VariantType": "Repeat"
        }
      }
    }
  }

[2022-03-25 14:42:00.258] [info] Loading specification of locus RFC1 [2022-03-25 14:42:00.260] [error] Failed to extract reads from the specified region

What to do? :-(

REViewer-v0.2.7-linux_x86_64\
    --reads "$out"/exphout_"${sample}"_realigned.sorted.bam\
    --reference "$genome"\
    --catalog "$catalog"\
    --vcf exphout_"${sample}".vcf\
    --locus "RFC1"\
    --output-prefix "$out"/reviewerout_"${sample}"

Grateful for help with this! Am excited to see the pretty diagrams...

sclamons commented 2 years ago

This happens if samtools doesn't find any reads in your bam. What do you get if you put a samtools view "$out"/exphout_"${sample}"_realigned.sorted.bam chr4:39350044-39350099 in your script? Do you get output?

Jolleboll commented 2 years ago

That command returns an error, but if I remove the "chr" in "chr4", like so (expanded the variables for testing):

samtools view ../output/exphout_62670396_S1_realigned.sorted.bam "4:39350044-39350099"

I get a bunch of reads.

The error I get when using the command you wrote goes like:

[main_samview] region "chr4:39350044-39350099" specifies an invalid region or unknown reference. Continue anyway.

Does that make sense? The chromosome names in my reference fasta are in the chr1, chr2, ..., chrX, chrY, chrM format.

Jolleboll commented 2 years ago

If I change the names of the chromosomes in my reference fasta from e.g. 'chr4' to just '4', I get a little further:

REViewer now parsing 62670396_S1.

[2022-03-28 11:08:45.778] [info] Loading specification of locus RFC1
[2022-03-28 11:08:45.780] [info] Extracted 246 frags
[2022-03-28 11:08:45.780] [info] Calculating fragment length
[2022-03-28 11:08:45.780] [info] Fragment length is estimated to be 340
[2022-03-28 11:08:45.780] [info] Extracting genotype paths
[2022-03-28 11:08:45.783] [error] Unable to open file exphout_62670396_S1.vcf

I've no idea why the vcf can't be opened. It is readable by everyone, in a folder accessible by everyone. It has one entry:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  62670396_S1
4       39350044        .       A       <STR10>,<STR65> .       PASS    END=39350099;REF=11;RL=55;RU=AARRG;VARID=RFC1;REPID=RFC1        GT:SO:REPCN:REPCI:ADSP:ADFL:ADIR:LC     1/2:SPANNING/INREPEAT:10/65:10-10/53-102:11/0:20/33:0/17:41.513514

Sorry to be a bother. :smile:

And I still get output with the samtools view command you suggested (after changing chr4 to 4).

sclamons commented 2 years ago

That error happens before REViewer sees any of the contents of the VCF file, so for some reason it's completely unable to open it. If this were my project, I'd guess it was a typo somewhere in the VCF name—could you paste an ls -al of the folder you're running this in? Maybe a fresh eye can spot something.

Jolleboll commented 2 years ago

Thank you so much sclamons. I goofed real bad, I used the basename of the vcf file, rather than the relative path to it, when running REViewer... ugh. Well, everything seems to work now. Thanks again :-]