anders-biostat / MethSCAn

Python package with CLI for the analysis of single cell methylation data.
https://anders-biostat.github.io/MethSCAn/
GNU General Public License v3.0
11 stars 3 forks source link

how to load the mtx.gz file? #1

Closed ShouWenWang closed 3 weeks ago

ShouWenWang commented 3 weeks ago

      1 import scipy.io as sio
      3 # Read the MTX file
----> 4 matrix = sio.mmread('/storage/wangshouwenLab/limited_shared_folder/analysis/shouwen/DMR_analysis/matrix/matrix.mtx.gz')

File ~/miniconda3/envs/CoSpar_lab/lib/python3.8/site-packages/scipy/io/_mmio.py:129, in mmread(source)
     84 def mmread(source):
     85     """
     86     Reads the contents of a Matrix Market file-like 'source' into a matrix.
     87 
   (...)
    127            [0., 0., 0., 0., 0.]])
    128     """
--> 129     return MMFile().read(source)

File ~/miniconda3/envs/CoSpar_lab/lib/python3.8/site-packages/scipy/io/_mmio.py:578, in MMFile.read(self, source)
    575 stream, close_it = self._open(source)
    577 try:
--> 578     self._parse_header(stream)
    579     return self._parse_body(stream)
    581 finally:

File ~/miniconda3/envs/CoSpar_lab/lib/python3.8/site-packages/scipy/io/_mmio.py:642, in MMFile._parse_header(self, stream)
    640 def _parse_header(self, stream):
    641     rows, cols, entries, format, field, symmetry = \
--> 642         self.__class__.info(stream)
    643     self._init_attrs(rows=rows, cols=cols, entries=entries, format=format,
    644                      field=field, symmetry=symmetry)

File ~/miniconda3/envs/CoSpar_lab/lib/python3.8/site-packages/scipy/io/_mmio.py:377, in MMFile.info(self, source)
    374 mmid, matrix, format, field, symmetry = \
    375     [asstr(part.strip()) for part in line.split()]
    376 if not mmid.startswith('%%MatrixMarket'):
--> 377     raise ValueError('source is not in Matrix Market format')
    378 if not matrix.lower() == 'matrix':
    379     raise ValueError("Problem reading file header: " + line)

ValueError: source is not in Matrix Market format
ShouWenWang commented 3 weeks ago

This is resolved.

LKremer commented 1 day ago

In case others want the solution:

from scipy.sparse import load_npz

mtx = load_npz("data_dir/1.npz")

Careful, this matrix comes in a special format. It's a sparse matrix in CSR format that contains three possible values: 0, 1 and -1. 0 has to be interpreted as NA, 1 has to be interpreted as "methylated" and -1 has to be interpreted as "unmethylated". Due to the sparse matrix format, 0 (NA) values are not explicitly stored and only "appear" once you convert to a dense format. To understand this format, you can read about the CSR sparse matrix format on wikipedia or check our paper.

At some point we also figured out how to load it into R, once I find this code snippet I'll put it here.