CompEpigen / methrix

An R :package: for fast and flexible DNA methylation analysis
https://www.bioconductor.org/packages/release/bioc/html/methrix.html
Other
28 stars 11 forks source link

Suggestions for example dataset #2

Closed PoisonAlien closed 3 years ago

PoisonAlien commented 5 years ago

Hello @tkik @MaxSchoenung @HeyLifeHD @lutsik ,

As a part of the package it would be nicer to have an example data for testing/demonstration. Bsseq has BS.chr22 which can be loaded with the command data(BS.chr22). In methrix I included a part of single cell dataset from bsmap protocol - data(mm9_bsmap). However @tkik suggested that this dataset is quite "boring" and we should use something more robust - and I agree.

Two options:

  1. We convert the dataset bundled with bsseq to methrix object and use it.
  2. We make our own custom dataset from well studied in-house data.

If you opt for 2, we should make sure that we have at-lease two samples while keeping the object size small. For example, bsseq data is quite small (3.8MB) with ~50K loci and 2 samples. Bioconductor has a package size limit of 5 MB and the core team is quite strict in this regard.

> data(BS.chr22)
> BS.chr22
An object of type 'BSseq' with
  494728 methylation loci
  2 samples
has not been smoothed
All assays are in-memory
> print(object.size(x = BS.chr22), units = "MB")
3.8 Mb

Suggestions welcome, and please post here if you have an example set in mind.

MaxSchoenung commented 5 years ago

I am currently trying to generate some methrix “vanilla” dataset with hESC TET triple knockout. I will try this today and tell you how it works out tomorrow. If this is fine for you ;) Maximilian Schönung Cancer Epigenomics PhD Student

German Cancer Research Center (DKFZ) Foundation under Public Law Im Neuenheimer Feld 280 69120 Heidelberg Germany phone: +49 6221 42-4321

m.schoenung@dkfz.demailto:m.schoenung@dkfz.de

On 28. May 2019, at 14:26, Anand Mayakonda notifications@github.com<mailto:notifications@github.com> wrote:

Hello @tkikhttps://github.com/tkik @MaxSchoenunghttps://github.com/MaxSchoenung @HeyLifeHDhttps://github.com/HeyLifeHD @lutsikhttps://github.com/lutsik ,

As a part of the package it would be nicer to have an example data for testing/demonstration. Bsseq has BS.chr22 which can be loaded with the command data(BS.chr22). In methrix I included a part of single cell dataset from bsmap protocol - data(mm9_bsmap). However @tkikhttps://github.com/tkik suggested that this dataset is quite "boring" and we should use something more robust - and I agree.

Two options:

  1. We convert the dataset bundled with bsseq to methrix object and use it.
  2. We make our own custom dataset from well studied in-house data.

If you opt for 2, we should make sure that we have at-lease two samples while keeping the object size small. For example, bsseq data is quite small (3.8MB) with ~50K loci and 2 samples. Bioconductor has a package size limit of 5 MBhttp://master.bioconductor.org/packages/devel/bioc/vignettes/BiocCheck/inst/doc/BiocCheck.html#package-and-file-size-check and the core team is quite strict in this regard.

data(BS.chr22) BS.chr22 An object of type 'BSseq' with 494728 methylation loci 2 samples has not been smoothed All assays are in-memory print(object.size(x = BS.chr22), units = "MB") 3.8 Mb

Suggestions welcome, and please post here if you have an example set in mind.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CompEpigen/methrix/issues/2?email_source=notifications&email_token=AJD3A3J3XXV7U2JIXNT7KVLPXUQGDA5CNFSM4HQC3GZKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GWG3IDA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJD3A3PMYZU5NHXIO2CPXR3PXUQGDANCNFSM4HQC3GZA.

PoisonAlien commented 5 years ago

Yup! Sounds good to me :)