CL-CHEN-Lab / OK-Seq

R package for the analysis of OK-Seq data to study DNA replication fork directionality: from count matrices, RFD calculation to inititation/termination zone calling.
GNU General Public License v3.0
10 stars 3 forks source link

Error in 'hist.default' R function breaks the code #5

Closed Elizabeth-mqz-gmz closed 7 months ago

Elizabeth-mqz-gmz commented 7 months ago

Hello! I am using the program to run it for OK-seq data being single-end and hg38. I obtained the reference for chromosome sizes in hg38 from UCSC, and I am using the default parameters as listed in the running example.

Unfortunately, I am facing this issue with the 'hist.default' function, and this error comes repeatedly. I would greatly appreciate any advice you could give me for this, probably I am setting one of the parameters in the wrong way.

source('../OKseqHMM.R')

OKseqHMM(bamfile = "../OK-seq_K562_BR1.bam",
  thresh=10,
  chrsizes = "../hg38.chrom.sizes",
  winS=15,
  fileOut = "hmm",
  binSize=1000)

[1] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[10] "chr19" "chr1"  "chr20" "chr21" "chr22" "chr2"  "chr3"  "chr4"  "chr5"
[19] "chr6"  "chr7"  "chr8"  "chr9"  "chrM"  "chrX"  "chrY"
[W::hts_idx_load2] The index file is older than the data file: ../OK-seq_K562_BR1.bam.bai
his bam is single-end."
[1] "Seperating the forward strand bam."
[1] "Seperating the reverse strand bam."
[1] "chr10"
[1] 133797422
[1] "Calculating 1kb binsize coverage for forward strand."
[1] "sigle-end bam file will be proceeded by default."
Error in hist.default(tags, breaks = breaks, plot = FALSE) :
  some 'x' not counted; maybe 'breaks' do not span range of 'x'

Thanks in advance!

Ala-Eddine-BOUDEMIA commented 7 months ago

Hello Elizabeth,

We appreciate your use of OKseqHMM.

It appears that there might be a discrepancy between your data sizes and the corresponding chromosome sizes. To address this, we recommend re-indexing your BAM file and rerunning the program (Since you have a warning about the index file being older). If this does not resolve the issue, you may also consider re-aligning your data.

Please let us know if these steps resolve the issue or if you require further assistance.

Elizabeth-mqz-gmz commented 7 months ago

Hello!

Thank you for your quick response, and I apologize for my delayed reply.

After several attempts, I was able to identify the root cause of the issue. It turns out that I was using the wrong genome version from the chrom.sizes file. I have corrected this now and the software is running smoothly for the first stage. Additionally, I followed your suggestion to re-index the BAM file, which resolved the warning message.

To provide context, the data I am working with is from a public source, and I was initially confused by the genome version they were using.

Thank you again for your assistance!

Elizabeth-mqz-gmz commented 7 months ago

Hello again! I just executed the second stage as follows:

source('../OKseqOEM.R')

OKseqOEM(bamInF = "../hmm_OK-seq_K562_BR1_fwd.bam", bamInR = "../hmm_OK-seq_K562_BR1_rev.bam", chrsizes = "../hg19.chr.size.txt", fileOut ="hmm_OK-seq_K562_BR1_final", binSize=1000, binList=c(1,10,20,50,100,250,500,1000))

Unfortunately, is presenting errors for the alternative chromosome reference _chr6_sstohap7:

[1] "chr6_ssto_hap7" [1] 4928567 [1] "It's single-end. Calculating 1000bp binsize coverage for forward strand." [main_samview] region "chr6_ssto_hap7" specifies an invalid region or unknown reference. Continue anyway. [1] "Calculating 1000bp binsize coverage for reverse strand." [main_samview] region "chr6_ssto_hap7" specifies an invalid region or unknown reference. Continue anyway. Error in read.table(fileInF, header = F, comment.char = "", colClasses = c("integer", : no lines available in input In addition: There were 50 or more warnings (use warnings() to see the first 50)

The regular chromosomes are not presenting any issue, but I would like to ask whether should I take any action on this.

This also happened while testing the data as indicated in the readme file from your templates folder. I supposed this happened because the demo data only provided chromosomes 21 & 22. But I just wanted to point it out in case is important. [1] "chr1" [1] 249250621 [1] "It's pair-end. Calculating 1000bp binsize coverage for forward strand." [1] "Calculating 1000bp binsize coverage for reverse strand." Error in read.table(fileInF, header = F, comment.char = "", colClasses = c("integer", : no lines available in input

Thanks!

Ala-Eddine-BOUDEMIA commented 7 months ago

Hi Elizabeth,

Happy that you figured it out. For OKseqOEM.R could you please remove manually the alternative chromosomes from the "hg19.chr.size.txt" (or any chromosome that is unmapped) it can't handle them automatically for now.

Best

Elizabeth-mqz-gmz commented 7 months ago

Hello!

This worked perfectly, thank you very much! :)

Best wishes