SydneyBioX / BIDCell

Biologically-informed deep learning for cell segmentation of subcelluar spatial transcriptomics data
Other
35 stars 5 forks source link

Can I also use this method if I don't have reference #22

Open Yifan-debug opened 3 weeks ago

Yifan-debug commented 3 weeks ago

Hi That's a great method! I'm wondering if, without the reference data and positive/negative markers (I only have the DAPI image and transcripts.csv), this method is still viable for cell segmentation. Is there any way to disable these parameters? Thank you so much.

xhelenfu commented 3 weeks ago

Hi, I haven't tested this myself but it might work if you have 1 row in the single cell reference and marker .csv files (can be a random/fake cell type), no cell types in elongated, and set pos_weight and neg_weight in the .yaml file to 0

Yifan-debug commented 3 weeks ago

Hi Thanks for the information. I tried, but it seems not good. Here is the error code I got.

ValidationError Traceback (most recent call last) Cell In[2], line 1 ----> 1 model = BIDCellModel("Xenium_test.yaml") 2 model.run_pipeline()

File /projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/bidcell/BIDCellModel.py:33, in BIDCellModel.init(self, config_file) 24 def init(self, config_file: str) -> None: 25 """Constructs a BIDCellModel instance using the user-supplied config file.\n 26 The configuration is validated during construction. 27 (...) 31 Path to the YAML configuration file. 32 """ ---> 33 self.config = load_config(config_file)

File /projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/bidcell/config.py:255, in load_config(path) 250 raise ValueError( 251 "The inputted YAML config was invalid, try looking at the example config." 252 ) 254 # validate the configuration schema --> 255 config = Config(**config) 256 return config

File /projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/pydantic/main.py:193, in BaseModel.init(self, **data) 191 # __tracebackhide__ tells pytest and some other tools to omit this function from tracebacks 192 tracebackhide = True --> 193 self.__pydantic_validator__.validate_python(data, self_instance=self)

ValidationError: 3 validation errors for Config files.fp_pos_markers Field required [type=missing, input_value={'data_dir': './data/1bla...ferences/sc_breast.csv'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/missing files.fp_neg_markers Field required [type=missing, input_value={'data_dir': './data/1bla...ferences/sc_breast.csv'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/missing model_params.elongated Input should be a valid list [type=list_type, input_value=None, input_type=NoneType] For further information visit https://errors.pydantic.dev/2.8/v/list_type

Yifan-debug commented 3 weeks ago

Here is my yaml file

for functions in bidcell/processing

NOTE: Commented options default to None

cpus: 16 # number of CPUs for multiprocessing

files: # NOTE: please ensure these point to the right locations data_dir: ./data/1blank # data directory for processed/output data fp_dapi: ./data/1blank/morphology.ome.tif # path of DAPI image or path of output stitched DAPI if using stitch_nuclei fp_transcripts: ./data/1blank/transcripts.csv # path of transcripts file fp_ref: ./data/sc_references/sc_breast.csv # path of transcripts file fp_transcripts: ./data/sc_references/sc_breast_markers_pos.csv # path of transcripts file fp_transcripts: ./data/sc_references/sc_breast_markers_neg.csv # path of transcripts file

nuclei_fovs: stitch_nuclei_fovs: False # set True to stitch separate FOVs of DAPI together in 1 image

nuclei: diameter: # estimated diameter of nuclei for Cellpose - or None to automatically compute, default: None

transcripts: shift_to_origin: False # shift to origin, making min(x) and min(y) (0,0) x_col: x_location # name of x location column in transcripts file y_col: y_location # name of y location column in transcripts file gene_col: feature_name # name of genes column in transcripts file transcripts_to_filter: # genes starting with these strings will be filtered out

affine: target_pix_um: 1.0 # microns per pixel to perform segmentation; default: 1.0 base_pix_x: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_x microns per pixel base_pix_y: 0.2125 # convert to microns along width by multiplying the original pixels by base_pix_y microns per pixel base_ts_x: 1.0 # convert between transcript locations and target pixels along width base_ts_y: 1.0 # convert between transcript locations and target pixels along height global_shift_x: 0 # additional adjustment to align transcripts to DAPI in target pixels along image width; default: 0 global_shift_y: 0 # additional adjustment to align transcripts to DAPI in target pixels along image height; default: 0

model_params: name: custom # segmentation model to use: custom for model in model.py or set to a encoder name from segmentation_models_pytorch; default: custom patch_size: 48 # size of transcriptomic image patches for input to DL model elongated: #list of elongated cell types that are in single-cell reference

training_params: total_epochs: 1 # number of training epochs; default: 1 total_steps: 4000 # number of training steps; default: 4000 ne_weight: 1.0 # weight for nuclei encapsulation loss; default: 1.0 os_weight: 1.0 # weight for oversegmentation loss; default: 1.0 cc_weight: 1.0 # weight for cell-calling loss; default: 1.0 ov_weight: 1.0 # weight for oversegmentation loss; default: 1.0 pos_weight: 0.0 # weight for cell-calling loss; default: 1.0 neg_weight: 0.0 # weight for oversegmentation loss; default: 1.0

testing_params: test_epoch: 1 # epoch to test; default: 1 test_step: 4000 # step number to test; default: 4000

experiment_dirs: dir_id: last # specify timestamp of output dir or leave blank to use latest dir, default: last

xhelenfu commented 3 weeks ago

Thanks, there is a typo with the file paths (fp_transcripts appears 3 times), please use:

fp_ref: ./data/sc_references/sc_breast.csv # path of transcripts file
fp_pos_markers: ./data/sc_references/sc_breast_markers_pos.csv # path of transcripts file
fp_neg_markers: ./data/sc_references/sc_breast_markers_neg.csv # path of transcripts file

Also seems like there needs to be at least one 'elongated' cell type, but it could be anything

elongated: #list of elongated cell types that are in single-cell reference
  - placeholder
Yifan-debug commented 2 weeks ago

Hi Thank you so much for keeping help I'm sorry for the typo. And thank you so much for pick it up. I changed the yaml file. and seems it works. But I still got another error code: ^M 0%| | 0/4 [00:00<?, ?it/s]^M 0%| | 0/4 [22:16<?, ?it/s] Traceback (most recent call last): File "/projectnb/mylabscc/yzw0093/BIDCell/Xenium_test/xenium_test.py", line 3, in model.run_pipeline() File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/bidcell/BIDCellModel.py", line 40, in run_pipeline self.preprocess() File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/bidcell/BIDCellModel.py", line 61, in preprocess segment_nuclei(self.config) File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/bidcell/processing/nuclei_segmentation.py", line 117, in segment_nuclei rdapi[h : h + hsize, w : w + wsize] = patch_resized ValueError: could not broadcast input array from shape (3,6800,53936) into shape (3,6800)

I directly used the morphology.ome.tiff from Xenium. it's a 3D Zstack image. I'm wondering do I need changed to 2D image. Is that the reason of this error code?

xhelenfu commented 2 weeks ago

That seems likely because the MIP image should be used, sorry I didn't see that earlier. Could you please try morphology_mip.ome.tif?

Yifan-debug commented 2 weeks ago

HI Thanks for the suggestion. I tried the morphology_mip.ome.tif. At least it starts running for a few days. But I still got errors. Do you have any ideas about this? Traceback (most recent call last): File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 1018, in getitem return self._range[new_key] IndexError: range object index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/bidcell/processing/cell_gene_matrix.py", line 27, in process_chunk chunk_id = chunk.index[0] File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 1020, in getitem raise IndexError( IndexError: index 0 is out of bounds for axis 0 with size 0 Traceback (most recent call last): File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 1018, in getitem return self._range[new_key] IndexError: range object index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/bidcell/processing/cell_gene_matrix.py", line 27, in process_chunk chunk_id = chunk.index[0] File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 1020, in getitem raise IndexError( IndexError: index 0 is out of bounds for axis 0 with size 0 Traceback (most recent call last): File "/projectnb/mylabscc/yzw0093/.conda/envs/bidcell/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 1018, in getitem return self._range[new_key] IndexError: range object index out of range

xhelenfu commented 2 weeks ago

Hi, is this error from model.make_cell_gene_mat(is_cell=False)? I'm guessing it might have happened when extracting expressions from the transcripts file. Could you please double check the file, its path, and the column names? If that doesn't work could you please check the tif images of the nuclei (or segmented cells) to check there are cells in the segmentation?

Yifan-debug commented 1 week ago

image HI @xhelenfu Please check the figure which is the nuclei tiff file. I'm not sure where to change the model.make_cell_gene_mat(is_cell=False) The file name and path is correct. for the column of the transcripts. Is that the column name for transcripts.parquet for Xenium?

Thank you so much for willing help.

xhelenfu commented 1 week ago

Could you please check that you are using transcripts.csv.gz as the transcripts file? I don't think it'll work with parquet. The column name of the genes is usually gene_col: feature_name.

Yifan-debug commented 1 week ago

Yes, you are correct the column name of the genes is feature_name. image