OpenMendel / SnpArrays.jl

Compressed storage for SNP data
https://openmendel.github.io/SnpArrays.jl/latest
Other
44 stars 9 forks source link

SnpArray view vs SnpArray.data matrix [question] #110

Open pdimens opened 3 years ago

pdimens commented 3 years ago

Hello and I apologize for posting this as a question since there is no Discussion component to the repository.

The components of a SnpArray include the data and row/column counts, however I cannot find the show function in the source code. I'm trying to understand the discrepancy between

# load the mouse data
julia> mouse = SnpArray("data/mouse.bed")
1940×10150 SnpArray:
 0x02  0x02  0x02  0x02  0x03  …  0x03  0x03  0x03  0x03  0x03
 0x02  0x02  0x03  0x02  0x02     0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x03
 0x02  0x02  0x02  0x02  0x02     0x03  0x03  0x03  0x03  0x03
 0x03  0x03  0x03  0x03  0x03     0x02  0x02  0x02  0x02  0x02
 0x02  0x02  0x02  0x02  0x03  …  0x03  0x03  0x03  0x03  0x03
    ⋮                          ⋱     ⋮                    
 0x03  0x03  0x03  0x03  0x03     0x03  0x03  0x03  0x03  0x03
 0x02  0x02  0x02  0x02  0x03  …  0x03  0x03  0x03  0x03  0x03
 0x02  0x02  0x03  0x02  0x02     0x03  0x03  0x03  0x03  0x03
 0x02  0x02  0x03  0x02  0x02     0x03  0x03  0x03  0x03  0x03
 0x02  0x02  0x02  0x02  0x02     0x01  0x01  0x01  0x01  0x01
 0x00  0x00  0x00  0x00  0x03     0x03  0x03  0x03  0x03  0x03

but the underlying SnpArray.data is a matrix of 485x10150

julia> mouse.data
485×10150 Matrix{UInt8}:
 0xba  0xba  0xbe  0xba  0xbb  …  0xff  0xff  0xff  0xff  0xff
 0xab  0xab  0xeb  0xab  0xbf     0xfe  0xfe  0xfe  0xfe  0xfe
 0xbe  0xbe  0xbf  0xbe  0xfe     0xcb  0xcb  0xcb  0xcb  0xcb
 0xab  0xab  0xab  0xab  0xff     0xf4  0xf4  0xf4  0xf4  0xf4
 0x8e  0x8e  0x8f  0x8e  0xfe     0xff  0xff  0xff  0xff  0xff
 0xae  0xae  0xaf  0xae  0xfe  …  0xfb  0xfb  0xfb  0xfb  0xfb
    ⋮                          ⋱     ⋮                    
 0x8c  0x8c  0xce  0x8c  0xbe     0xbf  0xbf  0xbf  0xbf  0xbf
 0x23  0x23  0x33  0x23  0xef  …  0xbb  0xbb  0xbb  0xbb  0xbb
 0xff  0xff  0xff  0xff  0xff     0xff  0xff  0xff  0xff  0xff
 0xaf  0xaf  0xaf  0xaf  0xef     0xff  0xff  0xff  0xff  0xff
 0xbb  0xbb  0xbb  0xbb  0xff     0xff  0xff  0xff  0xff  0xff
 0x2a  0x2a  0x2f  0x2a  0xea     0xdf  0xdf  0xdf  0xdf  0xdf

Thank you!

Hua-Zhou commented 3 years ago

Each genotype takes 2 bits. So internal data structure is a UInt8 matrix, with each entry encoding 4 genotypes.

Your suggestion of proper ‘show’ method for SnpArray is well taken. We will think about how to do it.

On Mon, Sep 6, 2021 at 9:33 AM Pavel V. Dimens @.***> wrote:

Hello and I apologize for posting this as a question since there is no Discussion component to the repository.

The components of a SnpArray include the data and row/column counts, however I cannot find the show function in the source code. I'm trying to understand the discrepancy between

load the mouse data

julia> mouse = SnpArray("data/mouse.bed") 1940×10150 SnpArray:

0x02 0x02 0x02 0x02 0x03 … 0x03 0x03 0x03 0x03 0x03

0x02 0x02 0x03 0x02 0x02 0x03 0x03 0x03 0x03 0x03

0x03 0x03 0x03 0x03 0x03 0x03 0x03 0x03 0x03 0x03

0x02 0x02 0x02 0x02 0x02 0x03 0x03 0x03 0x03 0x03

0x03 0x03 0x03 0x03 0x03 0x02 0x02 0x02 0x02 0x02

0x02 0x02 0x02 0x02 0x03 … 0x03 0x03 0x03 0x03 0x03

⋮                          ⋱     ⋮

0x03 0x03 0x03 0x03 0x03 0x03 0x03 0x03 0x03 0x03

0x02 0x02 0x02 0x02 0x03 … 0x03 0x03 0x03 0x03 0x03

0x02 0x02 0x03 0x02 0x02 0x03 0x03 0x03 0x03 0x03

0x02 0x02 0x03 0x02 0x02 0x03 0x03 0x03 0x03 0x03

0x02 0x02 0x02 0x02 0x02 0x01 0x01 0x01 0x01 0x01

0x00 0x00 0x00 0x00 0x03 0x03 0x03 0x03 0x03 0x03

but the underlying SnpArray.data is a matrix of 485x10150

julia> mouse.data

485×10150 Matrix{UInt8}:

0xba 0xba 0xbe 0xba 0xbb … 0xff 0xff 0xff 0xff 0xff

0xab 0xab 0xeb 0xab 0xbf 0xfe 0xfe 0xfe 0xfe 0xfe

0xbe 0xbe 0xbf 0xbe 0xfe 0xcb 0xcb 0xcb 0xcb 0xcb

0xab 0xab 0xab 0xab 0xff 0xf4 0xf4 0xf4 0xf4 0xf4

0x8e 0x8e 0x8f 0x8e 0xfe 0xff 0xff 0xff 0xff 0xff

0xae 0xae 0xaf 0xae 0xfe … 0xfb 0xfb 0xfb 0xfb 0xfb

⋮                          ⋱     ⋮

0x8c 0x8c 0xce 0x8c 0xbe 0xbf 0xbf 0xbf 0xbf 0xbf

0x23 0x23 0x33 0x23 0xef … 0xbb 0xbb 0xbb 0xbb 0xbb

0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff

0xaf 0xaf 0xaf 0xaf 0xef 0xff 0xff 0xff 0xff 0xff

0xbb 0xbb 0xbb 0xbb 0xff 0xff 0xff 0xff 0xff 0xff

0x2a 0x2a 0x2f 0x2a 0xea 0xdf 0xdf 0xdf 0xdf 0xdf

  • How is the SnpArray showing a matrix with 4x more rows than the underlying data?
  • What is the conversion between what seems like hex to binary?
  • Where is the show method for this?

Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OpenMendel/SnpArrays.jl/issues/110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGAPMNLNCYNTN54KXD33BTUATUODANCNFSM5DQYTK2A .

pdimens commented 3 years ago

Thanks for the response. If the SnpArray has multiple fields, how is the default show method showing only this matrix? I'm also not clear on where in the source code this conversion occurs.

Hua-Zhou commented 3 years ago

Since SnpArray is an AbstractArray, the show method falls back to the default method for AbstractArray in Julia. That is to use getindex to read a few entries to display. The getindex function for SnpArray is implemented at https://github.com/OpenMendel/SnpArrays.jl/blob/master/src/snparray.jl

On Mon, Sep 6, 2021 at 10:06 AM Pavel V. Dimens @.***> wrote:

Thanks for the response. If the SnpArray has multiple fields, how is the default show method showing only this matrix? I'm also not clear on where in the source code this conversion occurs.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/OpenMendel/SnpArrays.jl/issues/110#issuecomment-913775193, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGAPMO64FDXWRYXRUXQ27TUATYIXANCNFSM5DQYTK2A .