dcjones / proseg

Probabilistic cell segmentation for in situ spatial transcriptomics
Other
29 stars 1 forks source link

Proseg

Proseg (probabilistic segmentation) is a cell segmentation method for in situ spatial transcriptomics. Xenium, CosMx, and MERSCOPE platforms are currently supported.

Installing

Proseg can be built and installed with cargo. Clone this repository, then run

cargo install proseg

General usage

Proseg is run on a table of transcript positions which in some form must include preliminary assignments of transcripts to nuclei. Xenium, CosMx, and MERSCOPE all provide this out of the box in some form.

Proseg is invoked, at minimum like:

proseg /path/to/transcripts.csv.gz

There are command line arguments to tell it which columns in the csv file to use, but typically one of the presets --xenium, --cosmx, or --merfish are used.

Proseg is a sampling method, and in its current form in non-deterministic. From run to run, results will vary slightly.

General options

By default proseg will use all available CPU cores. To change this use --nthreads N.

Output options

Output is in the form of a number of tables, which can be either gzipped csv files or parquet files, and GeoJSON files giving cell boundaries.

Cell boundaries can be output a number of ways:

Modeling assumptions

A number of options can alter assumptions made by the model, which generally should not need

Running on Xenium datasets

Xenium data should be run with the --xenium argument.

Using Xenium Explorer with proseg-to-baysor

It is possible to use proseg segmentation with Xenium Explorer, but requires a little work.

The xeniumranger tool has a command to import segmentation from Baysor. To use this, we must first convert Proseg output to Baysor-compatible formatting.

For this we need transcript metadata and cell polygons from Proseg, then run the provided proseg-to-baysor command like

proseg-to-baysor \
    transcript-metadata.csv.gz \
    cell-polygons.geojson.gz \
    --output-transcript-metadata baysor-transcript-metadata.csv \
    --output-cell-polygons baysor-cell-polygons.geojson

Xenium Ranger can then be run to import these into a format useable with Xenium Explorer:

xeniumranger import-segmentation \
    --id project-id \
    --xeinum-bundle /path/to/original/xenium/output \
    --viz-polygons baysor-cell-polygons.geojson \
    --transcript-assignment baysor-transcript-metadata.csv \
    --units microns

This will output a new Xenium bundle under the project-id directory

~~Xenium Explorer currently has issues displaying Proseg polygons. It appears to perform some sort of naive polygon simplification that results in profoundly distorted polygons. There's not any known workaround for this issue for now.~~

Issues displaying proseg polygons in Xenium Explorer are resolved with more recent versions of Xenium Ranger (starting with 2.0).

Running on CosMx datasets

Current version of CosMx provide output that is shambolic and more difficult to deal with than other platforms. The recommended way of running proseg on CosMx datasets is to download the flat files from AtoMx and manually "stitch" and scale the FOV-level data using the provided Julia program provided in extra/stitch-cosmx.jl:

Some dependencies are required, which can be installed with

julia -e 'import Pkg; Pkg.add(["Glob", "CSV", "DataFrames", "CodecZlib", "ArgParse"])'

Then the program can be run with like

julia stitch-cosmx.jl /path/to/cosmx-flatfiles transcripts.csv.gz

to output a complete transcripts table to transcripts.csv.gz.

From here proseg can be run with

proseg --cosmx-micron transcripts.csv.gz

Alternatively, the --cosmx can used with CosMx data that is in pixel coordinates. It will automatically scale the data to micrometers.

Running on MERSCOPE datasets

No special considerations are needed for MERSCOPE data. Simply use the --merscope argument with the detected_transcripts.csv.gz file.