Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
43 stars 18 forks source link

expand makeGRangesFromDataFrame functionality #32

Open LiNk-NY opened 4 years ago

LiNk-NY commented 4 years ago

Hi Hervé, @hpages

I thought something like this would be useful. Let me know what you think. Particularly for inherits(x, "GenomicRanges"), it could also work for GRanges but I thought GenomicRanges would be more general. https://github.com/Bioconductor/GenomicRanges/blob/052f70f7b418a823c524486be7c611582fcf980f/R/GPos-class.R#L179

Thanks, Marcel

hpages commented 4 years ago

Thanks Marcel,

I would expect all 3 forms to work:

makeGPosFromDataFrame(data.frame(seqnames=1:6, pos=11:16))
makeGPosFromDataFrame(data.frame(seqnames=1:6, start=11:16))
makeGPosFromDataFrame(data.frame(seqnames=1:6, end=11:16))

but right now only the 1st one works.

I would also expect this to fail but not in such an obscure way:

makeGPosFromDataFrame(data.frame(seqnames=1:6, start=11:16, end=20))
# Error in stop_if_wrong_length("'seqnames'", ans_len) :
#   'seqnames' must have the length of the object to construct (45) or
#  length 1

Seems like we should start by agreeing on what behavior we expect exactly. A good way to do this is to come up with a set of unit tests that cover all the situations we anticipate.

I don't think makeGPosFromDataFrame() should have the start.field and end.field arguments. We only need a pos.field argument to let the user specify which data frame column contains the pos information in case we fail to auto-detect.

Also from a package organization point of view I'd prefer if makeGPosFromDataFrame was next to makeGRangesFromDataFrame i.e. the 2 functions should be implemented in the same file, documented in the same file, and have their unit tests in the same file. Thanks again!

H.

hpages commented 4 years ago

I should add that a GPos is a GRanges object so I wonder if we actually need 2 functions. Given that a GPos object can generally be used in any place where a GRanges is expected it sounds like it would be acceptable if makeGRangesFromDataFrame() was returning a GPos under certain conditions. We could add an argument, say as, to provide more control. It would be set to "auto" by default, but the user could set it to "GRanges" or "GPos" to enforce one type or the other.

LiNk-NY commented 4 years ago

Hi Hervé, @hpages I've updated the PR to reflect your feedback. I hacked the index localization a bit in R/makeGRangesFromDataFrame.R so that GPos is supported. Thanks -MR

LiNk-NY commented 4 years ago

Hi Hervé, @hpages I know you're busy but when you have a chance let me know if there is anything else that you'd like me to add. Thanks. -Marcel

liutiming commented 3 years ago

I was thinking of this functionality too. However, I do have difficulty in remembering the functions/arguments in many Bioconductor packages. Should makeGRangesFromDataFrame have a logic that when pos exists as a column, it sets both start and end according to that column (if not otherwise specified) and display a warning?