Closed xiaosuyu1997 closed 3 years ago
I believe the "inconsistency" you are referring to is because BED
-format data is 0-indexed and GRanges
are 1-indexed.
This is exquisitely well described in this blog post - https://tidyomics.com/blog/2018/12/09/2018-12-09-the-devil-0-and-1-coordinate-system-in-genomics/
I do not believe there is any problem with the ArchR code for this. When creating an Arrow, you are convering from 0-indexed to 1-indexed, thus you need to adjust the start. But when reading an Arrow file, you're already in 1-indexed format so you do not need to adjust anything.
If you think that this does not answer your question, feel free to comment here and I will re-open the issue.
Great help and many thanks. New to this field and miss some details.
One more question:
Suppose input is in bed(0-based open interval): chr1 0 1000 chr1 1000 2000
rtracklayer::import should give me(1-based closed interval): chr1 1 1000 chr1 1001 2000
But now in ArchR, this is what I get(1-based closed interval): chr1 1 999 chr1 1001 1999
Just wonder if that's the wanted feature?
Hmmm. Sorry if I missed this point in your original post. @jgranja24 - thoughts?
Hi @xiaosuyu1997,
I think you are correct it should be (start - end + 1). That is my bad, I switched the storage from start and stop to start and width and made this error. It is now fixed in the release_1.0.1 branch.
ex
> IRanges(79, 81)
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 79 81 3
@xiaosuyu1997 - thanks for reporting this and explaining the bug clearly. My apologies for not catching this in your original post. Closing this.
Attach your log file ArchR-createArrows-417c778276a-Date-2021-01-02_Time-13-42-18.log
Describe the bug While I modify the tutorial data a little bit and change the first fragment in scATAC_BMMC_R1 to: Results from getFragmentsFromArrow("test.arrow", cellNames = "test#TAGGAGGGTTAACCGT-1") give me: It should make more sense to me for these two results to be consistent (at least negative width should not appear, if I made a silly mistake, so sorry for that).
Code leading to this (take tabix input as an example) should be here: While saving width info to arrow file, (end - start + 1), which is the exact width, is not used: https://github.com/GreenleafLab/ArchR/blob/2f022a448d8248a0f9afb33419bcbaeafe7731c0/R/CreateArrow.R#L1281 However, while reconstructing GRange object from that, (end - start) is directly used as width, leading to one bp lose in the final results: https://github.com/GreenleafLab/ArchR/blob/2f022a448d8248a0f9afb33419bcbaeafe7731c0/R/ArrowRead.R#L187
And I just don't quite understand why it's needed here to add 1 to start(grange).
Expected behavior Consistency between input bed(tabix) and getFragmentsFromArrow results
Many thanks!