dcjones / proseg

Probabilistic cell segmentation for in situ spatial transcriptomics
Other
45 stars 3 forks source link

stitch-cosmx.jl is not working #14

Open roanvanscheppingen opened 5 months ago

roanvanscheppingen commented 5 months ago

Not familiar with julia, but it's not working. Using julia stitch-cosmx_RVS2.jl ../../rp_5/image_data/cosm03_images/MsBrainTMA/20240229_204141_S1/ transcripts.csv.gz

ERROR: LoadError: BoundsError: attempt to access 0-element Vector{String} at index [1]
Stacktrace:
 [1] getindex
   @ ./essentials.jl:13 [inlined]
 [2] main()
   @ Main /hpc/shared/prekovic/rvanscheppingen/Julia/stitch-cosmx_RVS2.jl:31
 [3] top-level scope
   @ /hpc/shared/prekovic/rvanscheppingen/Julia/stitch-cosmx_RVS2.jl:105

I think it has to do with calling the directory. Currently the code has config_filename = glob("S0/*/RunSummary/*_ExptConfig.txt", path)[1]

However, there is no S0 folder in the path. The path is image_data/cosm03_images/MsBrainTMA/20240229_204141_S1

Subdirectories are the RunSummary etc, as they should be. AnalysisResults AnalysisResultsArchived CellStatsDir plex-usm2wp948s.txt RunSummary

Running nano on the txt works and opens the file. nano MsBrainTMA/20240229_204141_S1/RunSummary/*_ExptConfig.txt

Note that in my edit of the .jl I've changed S0//RunSummary to ./RunSummary, but I have also tried .//RunSummary. Something similar was raised in #3 , but the fix implemented after kept the S0

Thanks in advance!

roanvanscheppingen commented 5 months ago

Perhaps the stitch is not necessary anymore?

Cosmx provides a flatfile of total transcripts with the following structure:

fov cell_ID cell x_local_px y_local_px x_global_px y_global_px z target

1 1 0 c_1_1_0 4245 2414 11805. 131234. 0 Ptpn1 2 1 0 c_1_1_0 4245 2505 11805. 131143. 4 Ptn 3 1 0 c_1_1_0 4245 2522 11805. 131126. 7 H3f3b 4 1 0 c_1_1_0 4245 2806 11805. 130842. 2 Uqcrq 5 1 0 c_1_1_0 4245 2808 11805. 130840. 2 Irf2

I have manually changed x_global_px and y_global_px to x and y and then proseg was able to handle the file. Could you elaborate on why stitching is necessary, or could the total flatfile also be used?

dcjones commented 5 months ago

It might not be necessary, I'll try to look at some more recent CosMx data. The issue is that CosMx output is a mess, completely undocumented, and keeps changing, which makes it a difficult platform to support.

I think what you are doing is fine, but you don't even need to change the input file. You can tell proseg what column names to look at with: proseg --cosmx --x-column x_global_px --y-column y_global_px ...

roanvanscheppingen commented 5 months ago

Yup, I agree that the file naming and also structure changes quite a lot over time. I have had to make PRs to import functions of Seurat because the cosmx compatibility was hardcoded in there and files had changed.

The --x-column and y--column arguments don't seem to work for me, but changing the column names is easy.

This might be of note to your preprint, but keep in mind that also the number of reported cells in Cosmx Seurats is different than those in the flatfiles or polygons. The automatically generated Seurat excludes all cells without any transcripts, even though they might have been segmented correctly with Cellpose