broadinstitute / wot

A software package for analyzing snapshots of developmental processes
https://broadinstitute.github.io/wot/
BSD 3-Clause "New" or "Revised" License
140 stars 34 forks source link

Reading mtx file #88

Open hiraksarkar opened 3 years ago

hiraksarkar commented 3 years ago

Hi,

Thanks for the awesome tool and package, I have a question regarding reading the mtx file format. According to the documentation from https://broadinstitute.github.io/wot/file_formats/,

MTX The MTX format is a sparse matrix format with genes on the rows and cells on the columns as output by Cell Ranger. You should also have TSV files with genes and barcode sequences corresponding to row and column indices, respectively. These files must be located in the same folder as the MTX file with the same base file name. For example if the MTX file is my_data.mtx, you should also have a my_data.genes.txt file and a my_data.barcodes.txt file.

However when I write a custom matrix in that format it's giving me this error

---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
<ipython-input-18-d3d0445c3b05> in <module>
      2     '/d0/home/hsarkar/notebooks/fetal_heart/scripts/sparse_matrix/HRT_20191220A.mtx',
      3     obs = '/d0/home/hsarkar/notebooks/fetal_heart/scripts/sparse_matrix/HRT_20191220A.genes.txt',
----> 4     var = '/d0/home/hsarkar/notebooks/fetal_heart/scripts/sparse_matrix/HRT_20191220A.barcodes.txt',
      5 )

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/wot/io/io.py in read_dataset(path, obs, var, obs_filter, var_filter, **keywords)
    427             obs = [obs]
    428         for item in obs:
--> 429             adata.obs = adata.obs.join(get_df(item))
    430     if var is not None:
    431         if not isinstance(var, list) and not isinstance(var, tuple):

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/wot/io/io.py in get_df(meta)
    418                 tmp_path = download_gs_url(meta)
    419                 meta = tmp_path
--> 420             meta = pd.read_csv(meta, sep=None, index_col='id', engine='python')
    421             if tmp_path is not None:
    422                 os.remove(tmp_path)

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    684     )
    685 
--> 686     return _read(filepath_or_buffer, kwds)
    687 
    688 

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    456 
    457     try:
--> 458         data = parser.read(nrows)
    459     finally:
    460         parser.close()

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
   1194     def read(self, nrows=None):
   1195         nrows = _validate_integer("nrows", nrows)
-> 1196         ret = self._engine.read(nrows)
   1197 
   1198         # May alter columns / col_dict

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, rows)
   2581             content = content[1:]
   2582 
-> 2583         alldata = self._rows_to_cols(content)
   2584         data = self._exclude_implicit_index(alldata)
   2585 

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in _rows_to_cols(self, content)
   3233                     msg += ". " + reason
   3234 
-> 3235                 self._alert_malformed(msg, row_num + 1)
   3236 
   3237         # see gh-13320

~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in _alert_malformed(self, msg, row_num)
   2994         """
   2995         if self.error_bad_lines:
-> 2996             raise ParserError(msg)
   2997         elif self.warn_bad_lines:
   2998             base = f"Skipping line {row_num}: "

ParserError: Expected 2 fields in line 2017, saw 3

Looking into the source-code https://github.com/broadinstitute/wot/blob/master/wot/io/io.py#L378 it does not seem like that mtx is read differently from the usual format. Am I missing something.

joshua-gould commented 3 years ago

mtx format is read using anndata.read_mtx ( https://anndata.readthedocs.io/en/latest/anndata.read_mtx.html)

On Sat, Dec 26, 2020 at 5:41 PM Hirak Sarkar notifications@github.com wrote:

Hi,

Thanks for the awesome tool and package, I have a question regarding reading the mtx file format. According to the documentation from https://broadinstitute.github.io/wot/file_formats/,

MTX The MTX format is a sparse matrix format with genes on the rows and cells on the columns as output by Cell Ranger. You should also have TSV files with genes and barcode sequences corresponding to row and column indices, respectively. These files must be located in the same folder as the MTX file with the same base file name. For example if the MTX file is my_data.mtx, you should also have a my_data.genes.txt file and a my_data.barcodes.txt file.

However when I write a custom matrix in that format it's giving me this error


ParserError Traceback (most recent call last)

in 2 '/d0/home/hsarkar/notebooks/fetal_heart/scripts/sparse_matrix/HRT_20191220A.mtx', 3 obs = '/d0/home/hsarkar/notebooks/fetal_heart/scripts/sparse_matrix/HRT_20191220A.genes.txt', ----> 4 var = '/d0/home/hsarkar/notebooks/fetal_heart/scripts/sparse_matrix/HRT_20191220A.barcodes.txt', 5 ) ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/wot/io/io.py in read_dataset(path, obs, var, obs_filter, var_filter, **keywords) 427 obs = [obs] 428 for item in obs: --> 429 adata.obs = adata.obs.join(get_df(item)) 430 if var is not None: 431 if not isinstance(var, list) and not isinstance(var, tuple): ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/wot/io/io.py in get_df(meta) 418 tmp_path = download_gs_url(meta) 419 meta = tmp_path --> 420 meta = pd.read_csv(meta, sep=None, index_col='id', engine='python') 421 if tmp_path is not None: 422 os.remove(tmp_path) ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) 684 ) 685 --> 686 return _read(filepath_or_buffer, kwds) 687 688 ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 456 457 try: --> 458 data = parser.read(nrows) 459 finally: 460 parser.close() ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows) 1194 def read(self, nrows=None): 1195 nrows = _validate_integer("nrows", nrows) -> 1196 ret = self._engine.read(nrows) 1197 1198 # May alter columns / col_dict ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, rows) 2581 content = content[1:] 2582 -> 2583 alldata = self._rows_to_cols(content) 2584 data = self._exclude_implicit_index(alldata) 2585 ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in _rows_to_cols(self, content) 3233 msg += ". " + reason 3234 -> 3235 self._alert_malformed(msg, row_num + 1) 3236 3237 # see gh-13320 ~/miniconda3/envs/val-ss/lib/python3.7/site-packages/pandas/io/parsers.py in _alert_malformed(self, msg, row_num) 2994 """ 2995 if self.error_bad_lines: -> 2996 raise ParserError(msg) 2997 elif self.warn_bad_lines: 2998 base = f"Skipping line {row_num}: " ParserError: Expected 2 fields in line 2017, saw 3 Looking into the source-code https://github.com/broadinstitute/wot/blob/master/wot/io/io.py#L378 it does not seem like that mtx is read differently from the usual format. Am I missing something. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe .