fangwuwang / team_Bloodies

0 stars 2 forks source link

Issue with RnBeads: #8

Open psomdeb25 opened 7 years ago

psomdeb25 commented 7 years ago

Following is the code that I am running on R Studio with the RnBeads package:

library(RnBeads)
data_dir <- "/Volumes/Dark/Study/University of British Columbia/Courses/GSAT 540/Project/Combined Data"
annotation <- file.path(data_dir, "annotation.csv") # file.path() attaches the file mentioned in 2nd argument to an R object that you assign to. 
data.source <- c(data_dir, annotation)

# Directory where the file is written
analysis.dir <- "/Volumes/Dark/Study/University of British Columbia/Courses/GSAT 540/Project/analysis1.2"

# Directory where the report file is written
report.dir <- file.path(analysis.dir, "reports_details")

rnb.initialize.reports(report.dir)
rnb.options(import.bed.style="bismarkCov")
rnb.options("import.bed.columns")
rnb.options(filtering.greedycut=F)
rnb.options(differential = FALSE)

# Set some analysis options
# rnb.options(filtering.sex.chromosomes.removal = TRUE, identifiers.column = "bedFile")
# logger.start(fname=NA)

# Setting Up RnbSet Object
result <- rnb.run.import(data.source = data.source, data.type = "bs.bed.dir", dir.reports = report.dir)

The rnb.run.import() stops midway. Is there a way I can look at the bed files to check whether the data is arranged in order. @fangwuwang @singha53 @santina

Thanks!

santina commented 7 years ago

@psomdeb25 This kind of question is probably best to get resolved in person, since we don't know what data you're using so we cannot really try to run the code ourselves.

What do you mean by it stops midway? Do you get any sort of error or warning message?

I usually get to UBC early so if you want we can meet before class tomorrow, since I need to leave right after class finishes at 11.

fangwuwang commented 7 years ago

@psomdeb25 Do you mean the chromosomal location is not ordered that caused the problem? If so, I can order the bed file with bedtools and give you the ordered data. You can drag the file into terminal by less (space) command and visualize the bed file.

psomdeb25 commented 7 years ago

I think it is probably due to large size of the file. "top" shows that it uses 97% processor and about 5 GB of RAM. I tried using the package ff, but still nothing. Maybe if could split the annotation file into sub groups and analyse the data separately?

fangwuwang commented 7 years ago

@psomdeb25 Yeah, that is really big dataset and we kind of expect that... I think it is reasonable to do pairwise analysis with two input populations at one time. There are seven pairs particularly interesting to us because of our biological question, which are HSC vs MPP, MPP vs CMP, MPP vs CLP, CLP vs MLP, CMP vs GMP, CMP vs MEP, GMP vs MEP. I suggest to run separately for these comparisons.

fangwuwang commented 7 years ago

@psomdeb25 Can you also include the CMP vs CLP and MLP vs GMP comparisons? That would be interesting.