digitalcytometry / cytospace

CytoSPACE: Optimal mapping of scRNA-seq data to spatial transcriptomics data
Other
115 stars 19 forks source link

Quick Gotchas and current solutions #56

Closed simoncmo closed 1 year ago

simoncmo commented 1 year ago

HI Cytospace team,

Thank you all for providing this cool tool! I just give it a try on our samples and it look certainly promising on my dataset so thank you for developing this tool.

During the test, I ran into some issues that was resolved later. Thought I would share the experience here in case someone else also run into these errors

Error1:

"Cell type TYPE_C0 in the ST dataset is not available in the scRNA-seq dataset." During main cytospace run, raised by line 244 in the cytospace.py After inspected my data, I realize this is related to Cell ID in scRNA_data.txt v.s. cell_type_labels.txt file.

2 scenario would trigger this error for me:

  1. Some Cell ID in snRNA_data.txt is missing in cell_type_labels.txt file. Usually caused by manually post filtering the unwanted cell type e.g. Doublet, after generating the file using provided generate_cytospace_from_scRNA_seurat_object function.
  2. All Cell IDs in snRNA_data.txt are in cell_type_labels.txt file BUT some CellType are NA

Somehow missing valueㄋ in the cell_type_numbers_int inside main_cytospace are filled with TYPE_C0

Solution: Need to prepare scRNA_data.txt v.s. cell_type_labels.txt file again with following adjustment:

  1. Use Seurat's implementation of subset to first filter out the unwanted cell type e.g. Doublet
  2. Set NA CellType to a string such as Unknown. e.g. scRNA_obj@meta.data[['cell_type']] = tidyr::replace_na(scRNA_obj@meta.data[['cell_type']], "Unknown") or scRNA_obj@meta.data[['cell_type']][is.na(scRNA_obj@meta.data[['cell_type']])] = "Unknown"

Then run generate_cytospace_from_scRNA_seurat_object. This make sure meta.data and count_matrix have sample Cell ID and no missing value in the desire cell type column

Error2:

"ModuleNotFoundError: No module named 'cytospace.common'; 'cytospace' is not a package." Senario: Trying to run the cytospace.py script itself by calling python cytospace/cytospace.py as instructed in the readme as an alternative way to run the tool.

I'm personally more familiar with R than Python, so this one was solved by help of ChatGPT : ] Seemingly it's caused by the conflicting file name and package name. Changing it to something like cytospace_script.py fixed this. I put the ChatGPT response here as a reference.

This error occurs because you have named your Python script file as cytospace.py, which has the same name as the cytospace module that you are trying to import. When you run import cytospace, Python first looks for a package or module named cytospace in the current directory, and since it finds your script file first, it assumes that it is the cytospace module and tries to import cytospace.common from it, which results in the error you are seeing.

To fix this issue, you need to rename your script file to something else that does not conflict with the name of the cytospace module. For example, you could rename it to cytospace_script.py. Then, when you run python cytospace_script.py, Python will not mistake your script file for the cytospace module, and the import statement should work correctly.

hsjeon-k commented 1 year ago

Hello,

Thank you so much for sharing your approaches! I am certain that this will help us as well as the other users.

If you are seeing TYPE_C0 in cell_type_numbers_int, I would additionally suggest to make sure that the cell type estimation fraction file (1) has a row name for the first line, and (2) that there are a same number of values as there are column names. This "cell type fraction estimation file" will be the {output_prefix}Seurat_weights.txt file if you used the default setting, or the one you specified with the -ctfep flag if you provided your own. C0 is a placeholder for missing column names that the datatable package uses, so if it thinks that the provided file does not have a heading for each column, it may substitute the last one with C0 (to which we append TYPE_ later in the code).

Regarding the second part, we are glad to hear that you were able to find a solution! I would personally recommend that users run CytoSPACE with cytospace [...], as python /path/to/cytospace.py [...] achieves the same functionality but with additional complexity about the path to cytospace.py. We are considering removing the python /path/to/cytospace.py [...] part in the upcoming updates, but please feel free to let us know if you had a specific use case for using this option as opposed to the cytospace [...] command!

simoncmo commented 1 year ago

Hi!

Thanks for the prompt reply and the explanation Hsjeon! And yeah that make sense thanks!

For me, running the cytospace.py directly was helpful for debugging, that is actually the way I found that TYPE_C0 was in cell_type_numbers_int. I personally will still appreciate a copy like this so one can use it when there are issue running their own sample. But if there are other ways to enable debugging using installed cytospace would love to know too!

hsjeon-k commented 1 year ago

I see, thank you for the information!

When debugging, I make changes in the local copy of the code and reinstall the package with updates using pip install . inside the cytospace directory. This way, all the print statements and breakpoint() / pdb.set_trace() commands will work as usual, with the added advantage of making sure everything works within the package setting. I think running python /path/to/cytospace.py without reinstalling the package will work similarly in most cases, but there may be a few things that it may not be able to capture -- for example, the code is currently configured to find and call the R script get_cellfracs_seuratv3.R that was installed with the package (rather than the local copy inside the cloned directory), and therefore, any changes that you make in your local version of this R script for debugging purposes may not be reflected unless you re-run pip install .. I would personally recommend debugging with the pip install . + cytospace [...] commands, just to avoid any discrepancies between running CytoSPACE as a group of files and running it as an installed package.

We really appreciate your letting us know about these issues! For now, we will temporarily leave the python /path/to/cytospace.py part out from the documentation (at least until the next update) since it does not work with the current file tree at the moment, but we will certainly consider bringing it back later as necessary. Thank you for your input!

simoncmo commented 1 year ago

Got it yeah thats a good point, pip install . sounds like the best way now. Will definitely give that a try some time. Of course glad to be helpful in a certain way and yeah that sounds like a good plan! I will close this for now. Thanks again for the reply and instructions. Look forward to any updates your team has planned!