databio / GenomicDistributionsData

Data for GenomicDistributions R package
Other
3 stars 1 forks source link

Issues with some hg38 files calculation functions #1

Closed joseverdezoto closed 4 years ago

joseverdezoto commented 4 years ago

@stolarczyk I was testing the tutorial with the GenomicDistributionsData library loaded into the regionstat.R script. All of the plots for hg19 files work as expected, but when I try to run bedstat on hg38 files I get the following error:

Error in `rownames<-`(`*tmp*`, value = names(x)) : 
  attempt to set 'rownames' on an object with no dimensions
Calls: doItAall ... elementMetadata -> elementMetadata -> rownames<- -> rownames<-
Execution halted
</pre>
Command completed. Elapsed time: 0:00:06. Running peak memory: 0.498GB.  
  PID: 12999;   Command: Rscript;   Return code: 1; Memory used: 0.498GB

I tried doing the calculations for hg38 files in the console, and I had issues with the calcPartitionsRef and the calcFeatureDistRefTSS functions. The other ones seem to be fine.

nsheff commented 4 years ago

this looks similar to:

https://github.com/databio/GenomicDistributions/issues/111

stolarczyk commented 4 years ago

can you verify you have GenomicDistributionsData package installed?

nsheff commented 4 years ago

can you provide your query?

joseverdezoto commented 4 years ago

I loaded the GenomicDistributionsData library as well as TSS_hg38 data succesfully in the console:

library(GenomicDistributionsData)
data(TSS_hg38)

My query is the first file in the bedbase tutorial:

bedbase_tutorial/bed_files/GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bed.gz
stolarczyk commented 4 years ago

I was trying to reproduce the error, but I could not even load the BED file:

> query = rtracklayer::import("GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bed.gz")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got '16.25951'

this is the file I was trying to read in: /project/shefflab/www/example_data/bedbase_tutorial/bed_files/GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bed.gz

stolarczyk commented 4 years ago

LOLA::readBed() worked

stolarczyk commented 4 years ago

it worked:

> query = LOLA::readBed("/project/shefflab/www/example_data/bedbase_tutorial/bed_files/GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bed.gz")
> gp = calcPartitionsRef(query, "hg38")
Calculating overlaps...
> gp
     partition  Freq
1         exon  2883
2      fiveUTR  6665
3   intergenic 21216
4       intron 22617
5 promoterCore 10524
6 promoterProx  2803
7     threeUTR  2020
> remove.packages("GenomicDistributionsData")
Removing package from '/home/mjs5kd/R/4.0'
(as 'lib' is unspecified)
> gp = calcPartitionsRef(query, "hg38")
Error in getReferenceData(refAssembly, tagline = "geneModels_") : 
  geneModels_hg38 not available in GenomicDistributions package and GenomicDistributionsData package is not installed
joseverdezoto commented 4 years ago

@stolarczyk Did you try the Feature distance plots? More specifically calcFeatureDistRefTSS

stolarczyk commented 4 years ago

just tried it, also works:

> head(calcFeatureDistRefTSS(query, "hg38"))
[1]     -66610  -25464544  -17435806  -26507997    9496085 -160405210
joseverdezoto commented 4 years ago

I wonder what the issue is. I installed GenomicDistributionsData with install.packages(). The library and data loading functions work well and all of the plots are produced for hg19.