CL-CHEN-Lab / OK-Seq

R package for the analysis of OK-Seq data to study DNA replication fork directionality: from count matrices, RFD calculation to inititation/termination zone calling.
GNU General Public License v3.0
10 stars 3 forks source link

Error in 'hist.default' R function breaks the code #5

Closed Elizabeth-mqz-gmz closed 7 months ago

Elizabeth-mqz-gmz commented 7 months ago

Hello! I am using the program to run it for OK-seq data being single-end and hg38. I obtained the reference for chromosome sizes in hg38 from UCSC, and I am using the default parameters as listed in the running example.

Unfortunately, I am facing this issue with the 'hist.default' function, and this error comes repeatedly. I would greatly appreciate any advice you could give me for this, probably I am setting one of the parameters in the wrong way.

source('../OKseqHMM.R')

OKseqHMM(bamfile = "../OK-seq_K562_BR1.bam",
  thresh=10,
  chrsizes = "../hg38.chrom.sizes",
  winS=15,
  fileOut = "hmm",
  binSize=1000)

[1] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[10] "chr19" "chr1"  "chr20" "chr21" "chr22" "chr2"  "chr3"  "chr4"  "chr5"
[19] "chr6"  "chr7"  "chr8"  "chr9"  "chrM"  "chrX"  "chrY"
[W::hts_idx_load2] The index file is older than the data file: ../OK-seq_K562_BR1.bam.bai
1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 9000000 10000000 11000000 12000000 13000000 14000000 15000000 16000000 17000000 18000000 19000000 20000000 21000000 22000000 23000000 24000000 25000000 26000000 27000000 28000000 29000000 30000000 31000000 32000000 33000000 34000000 35000000 36000000 37000000 38000000 39000000 40000000 41000000 42000000 43000000 44000000 45000000 46000000 47000000 48000000 49000000 50000000 51000000 52000000 53000000 54000000 55000000 56000000 57000000 58000000 59000000 60000000 61000000 62000000 63000000 64000000 65000000 66000000 67000000 68000000 69000000 70000000 71000000 72000000 73000000 74000000 75000000 76000000 77000000 78000000 79000000 80000000 81000000 82000000 83000000 84000000 85000000 86000000 87000000 88000000 89000000 90000000 91000000 92000000 93000000 94000000 95000000 96000000 97000000 98000000 99000000 100000000 101000000 102000000 103000000 104000000 105000000 106000000 107000000 108000000 109000000 110000000 111000000 112000000 113000000 114000000 115000000 116000000 117000000 118000000 119000000 120000000 121000000 122000000 123000000 124000000 125000000 126000000 127000000 128000000 129000000 130000000 131000000 132000000 133000000 134000000 135000000 136000000 137000000 138000000 139000000 140000000 141000000 142000000 143000000 144000000 145000000 146000000 147000000 148000000 149000000 150000000 151000000 152000000 153000000 154000000 155000000 156000000 157000000 158000000 159000000 160000000 161000000 162000000 163000000 164000000 164578909 [1] "This bam is single-end."
[1] "Seperating the forward strand bam."
[1] "Seperating the reverse strand bam."
[1] "chr10"
[1] 133797422
[1] "Calculating 1kb binsize coverage for forward strand."
[1] "sigle-end bam file will be proceeded by default."
Error in hist.default(tags, breaks = breaks, plot = FALSE) :
  some 'x' not counted; maybe 'breaks' do not span range of 'x'

Thanks in advance!

Ala-Eddine-BOUDEMIA commented 7 months ago

Hello Elizabeth,

We appreciate your use of OKseqHMM.

It appears that there might be a discrepancy between your data sizes and the corresponding chromosome sizes. To address this, we recommend re-indexing your BAM file and rerunning the program (Since you have a warning about the index file being older). If this does not resolve the issue, you may also consider re-aligning your data.

Please let us know if these steps resolve the issue or if you require further assistance.

Elizabeth-mqz-gmz commented 7 months ago

Hello!

Thank you for your quick response, and I apologize for my delayed reply.

After several attempts, I was able to identify the root cause of the issue. It turns out that I was using the wrong genome version from the chrom.sizes file. I have corrected this now and the software is running smoothly for the first stage. Additionally, I followed your suggestion to re-index the BAM file, which resolved the warning message.

To provide context, the data I am working with is from a public source, and I was initially confused by the genome version they were using.

Thank you again for your assistance!

Elizabeth-mqz-gmz commented 7 months ago

Hello again! I just executed the second stage as follows:

source('../OKseqOEM.R')

OKseqOEM(bamInF = "../hmm_OK-seq_K562_BR1_fwd.bam", bamInR = "../hmm_OK-seq_K562_BR1_rev.bam", chrsizes = "../hg19.chr.size.txt", fileOut ="hmm_OK-seq_K562_BR1_final", binSize=1000, binList=c(1,10,20,50,100,250,500,1000))

Unfortunately, is presenting errors for the alternative chromosome reference _chr6_sstohap7:

[1] "chr6_ssto_hap7" [1] 4928567 [1] "It's single-end. Calculating 1000bp binsize coverage for forward strand." [main_samview] region "chr6_ssto_hap7" specifies an invalid region or unknown reference. Continue anyway. [1] "Calculating 1000bp binsize coverage for reverse strand." [main_samview] region "chr6_ssto_hap7" specifies an invalid region or unknown reference. Continue anyway. Error in read.table(fileInF, header = F, comment.char = "", colClasses = c("integer", : no lines available in input In addition: There were 50 or more warnings (use warnings() to see the first 50)

The regular chromosomes are not presenting any issue, but I would like to ask whether should I take any action on this.

This also happened while testing the data as indicated in the readme file from your templates folder. I supposed this happened because the demo data only provided chromosomes 21 & 22. But I just wanted to point it out in case is important. [1] "chr1" [1] 249250621 [1] "It's pair-end. Calculating 1000bp binsize coverage for forward strand." [1] "Calculating 1000bp binsize coverage for reverse strand." Error in read.table(fileInF, header = F, comment.char = "", colClasses = c("integer", : no lines available in input

Thanks!

Ala-Eddine-BOUDEMIA commented 7 months ago

Hi Elizabeth,

Happy that you figured it out. For OKseqOEM.R could you please remove manually the alternative chromosomes from the "hg19.chr.size.txt" (or any chromosome that is unmapped) it can't handle them automatically for now.

Best

Elizabeth-mqz-gmz commented 7 months ago

Hello!

This worked perfectly, thank you very much! :)

Best wishes