Nanostring-Biostats / CosMx-Analysis-Scratch-Space

This repository is an exploratory resource to accelerate opensource analysis of CosMx® Spatial Molecular Imager (SMI) data. Contained here are and writeups and vignettes addressing a variety of topics discussed when analyzing single-cell spatial data.
https://nanostring-biostats.github.io/CosMx-Analysis-Scratch-Space/
Other
34 stars 4 forks source link

Error and missing functionality when using squidpy on AtoMx 1.3.2 exports #69

Closed eveilyeverafter closed 1 week ago

eveilyeverafter commented 5 months ago

squidpy has a method for reading nanostring data that is putatively based on the older processed files (aka "flat files").

https://squidpy.readthedocs.io/en/stable/api/squidpy.read.nanostring.html

These processed files have changed formats over the years and these changes can cause python users to not be able to read in data with squidpy's read.nanostring method. Indeed, when I try to read flat files from AtoMx (v 1.3.2) natively with read.nanostring, I get this following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 4
      1 new_dir = "/Volumes/Extreme_Pro/data/agbt_breast/AUG29_13INTEGR_6K_BRST_PS_S2"
----> 4 adata2 = sq.read.nanostring(
      5     path = new_dir,
      6     counts_file="AUG29_13INTEGR_6K_BRST_PS_S2_exprMat_file.csv",
      7     meta_file="AUG29_13INTEGR_6K_BRST_PS_S2_metadata_file.csv",
      8     fov_file="AUG29_13INTEGR_6K_BRST_PS_S2_fov_positions_file.csv",
      9 )

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/squidpy/read/_read.py:266, in nanostring(path, counts_file, meta_file, fov_file)
    263                     continue
    265 if fov_file is not None:
--> 266     fov_positions = pd.read_csv(path / fov_file, header=0, index_col=fov_key)
    267     for fov, row in fov_positions.iterrows():
    268         try:

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
   1013 kwds_defaults = _refine_defaults_read(
   1014     dialect,
   1015     delimiter,
   (...)
   1022     dtype_backend=dtype_backend,
   1023 )
   1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py:626, in _read(filepath_or_buffer, kwds)
    623     return parser
    625 with parser:
--> 626     return parser.read(nrows)

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1923, in TextFileReader.read(self, nrows)
   1916 nrows = validate_integer("nrows", nrows)
   1917 try:
   1918     # error: "ParserBase" has no attribute "read"
   1919     (
   1920         index,
   1921         columns,
   1922         col_dict,
-> 1923     ) = self._engine.read(  # type: ignore[attr-defined]
   1924         nrows
   1925     )
   1926 except Exception:
   1927     self.close()

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py:333, in CParserWrapper.read(self, nrows)
    330     data = {k: v for k, (i, v) in zip(names, data_tups)}
    332     names, date_data = self._do_date_conversions(names, data)
--> 333     index, column_names = self._make_index(date_data, alldata, names)
    335 return index, column_names, date_data

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/pandas/io/parsers/base_parser.py:371, in ParserBase._make_index(self, data, alldata, columns, indexnamerow)
    368     index = None
    370 elif not self._has_complex_date_col:
--> 371     simple_index = self._get_simple_index(alldata, columns)
    372     index = self._agg_index(simple_index)
    373 elif self._has_complex_date_col:

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/pandas/io/parsers/base_parser.py:403, in ParserBase._get_simple_index(self, data, columns)
    401 index = []
    402 for idx in self.index_col:
--> 403     i = ix(idx)
    404     to_remove.append(i)
    405     index.append(data[i])

File ~/Documents/Projects/squidpy_patches/.venv/lib/python3.10/site-packages/pandas/io/parsers/base_parser.py:398, in ParserBase._get_simple_index.<locals>.ix(col)
    396 if not isinstance(col, str):
    397     return col
--> 398 raise ValueError(f"Index {col} invalid")

ValueError: Index fov invalid

A longer term solution would be to adjust the squidpy code directly to fix allow for the newer format of the flat files. A short-term fix for scratch space would simply be to add a conditional in the workflow and pivot the fov file into the expected, existing squidpy format (based on the legacy flat files).

This proposed patch should fix the non-image error so one can use the read.nanostring method without reference to images. This is partially redundant with our annData blog post solution.

However, there's a second issue that results in missing functionality. That is, AtoMx currently does not export the composite images. So a second part of this issue would be to pivot the imaging data (which is present in the RawData exports) into a format expected by squidpy. The blog post on creating composite images should be useful here.

I suggest releasing these solutions in piecemeal (i.e., creating an initial blog post for the non-image-based patch and then expanding it when the image-based solution is ready).

Tasks [ X] Add conditional for fov file and pivot data format as needed [ X] Add a workflow for processing AtoMx 1.3.2 RawData so that composite images can be viewed