frankligy / scTriangulate

scTriangulate is a Python package to mix-and-match conflicting clustering results in single cell analysis and generate reconciled clustering solutions
MIT License
35 stars 5 forks source link

Files not found when running compute_metrics() on Windows #21

Open cjiang310437 opened 1 year ago

cjiang310437 commented 1 year ago

I ran my code on windows system. Here are the errors:

File "", line 1, in File "C:\Users\jiabl9\Anaconda3\envs\scTriangulate\lib\site-packages\sctriangulate\main_class.py", line 972, in compute_metrics subprocess.run(['rm','-r','{}'.format(os.path.join(self.dir,'scTriangulate_local_mode_enrichr/'))]) File "C:\Users\jiabl9\Anaconda3\envs\scTriangulate\lib\subprocess.py", line 493, in run with Popen(*popenargs, **kwargs) as process: File "C:\Users\jiabl9\Anaconda3\envs\scTriangulate\lib\subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\jiabl9\Anaconda3\envs\scTriangulate\lib\subprocess.py", line 1311, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

frankligy commented 1 year ago

Hi @cjiang310437

Thanks for bringing it up, I think the reason behind that is that I hard-coded the path of the temporary folder as self.dir,'scTriangulate_local_mode_enrichr/, whereas the trailing forward slash / can not be valid in the windows system. This will be a fix for me, I will remove this / in the next release as soon as I can, and I will update you on that.

frankligy commented 1 year ago

This issue now has been fixed by removing all hard-coded Unix style file paths "/", please download this specific commit (https://github.com/frankligy/scTriangulate/commit/2c45cb7dc2279ae094d04bcd1e37438a0d3d947a) using pip, instructions can be found (https://stackoverflow.com/questions/13685920/install-specific-git-commit-with-pip).

Bio-data-tricks commented 1 year ago

Hi @frankligy, I installed the specific commit with pip install git+https://github.com/frankligy/scTriangulate.git@2c45cb7 but I have the same error.

Nicolas

frankligy commented 1 year ago

Hi @Bio-data-tricks,

Sorry for the confusion and inconvenience, you may use the full SHA hash for this commit like below:

pip install git+https://github.com/frankligy/scTriangulate.git@2c45cb7dc2279ae094d04bcd1e37438a0d3d947a

Then run the program like the testing example:

>>> import scanpy as sc
>>> from sctriangulate import *
>>> from sctriangulate.preprocessing import *
>>> sctriangulate_setting(backend='Agg')
>>> adata = sc.read('./test/input.h5ad')
>>> sctri = ScTriangulate(dir='./output',adata=adata,query=['sctri_rna_leiden_1','sctri_rna_leiden_2','sctri_rna_leiden_3'])
2023-02-22 23:03:43,557 - INFO - Choosing logging to console (VERBOSE=1)
2023-02-22 23:03:43,616 - INFO - skip scrublet doublet prediction, instead doublet is filled using value 0.5
>>> sctri.lazy_run()

I attached the standard output from my test run on my own PC, it is windows 10 system with Intel chip. test_commit_pc.txt

And how the output result should look like: image

Let me know if that can solve the problem and happy to further clarify!

Best, Frank

Bio-data-tricks commented 1 year ago

Hi @frankligy, Thank you for your response, I reinstalled sctriangulate in a new environnent with only python but I have the same error.

Nicolas

Bio-data-tricks commented 1 year ago

The package works with Linux.

I take this opportunity to ask you some questions: I would like to use sctriangulate on non transcriptomic data (spectroscopic data imaging).

1) Do you see any limitation ?

2) Are all the metrics suitable for my data. For example, reading the paper in the Tfidf section 'This score is based on the observation that genes or other features that are uniquely expressed in one cluster', does the use of the TFIDF score make sense since my dataset does not have features with a value of 0 in some clusters? In this case, how can we remove this score during the calculations. I tried to use sctri.metrics = ['reassign', 'SCCAF'] before the sctri.compute_metrics(parallel=True) function, but it doesn't seem to affect the choice of metrics as they are all still used.

3) Is there a way to consider the spatial information ? For example, the fact of finding clusters at the same image localisation (cluster co-occurrence in spatial dimensions) can lead to a more reliable choice of a clustering result.

Thank you for your time.

Nicolas

frankligy commented 1 year ago

Hi @Bio-data-tricks,

Thanks for digging into that and that's all great questions!

[1] Since we preprint the paper about a year ago, we've been exploring the possibility to apply it to spatial data, and this is one of the avenues that we are trying to pursue ourselves as well. So far, I think the concept of scTriangulate generalizes very well to the spatial context, due to its flexibility to incorporate any features and stability metrics. So as long as there are observations of the shape n_obs * n_features, where the features can be abstract, it can be genes, mutations, spatial coordinates, image pixel density, and so on, scTriangulate should be applicable.

[2] That's a great point, indeed, when the values are not naturally zero and non-zero, TFIDF doesn't make logical sense if directly applying it. This is a common problem we encounter in the single-cell field as well, sometimes the data is transformed after batch correction which violates the non-negativity of the count data. What we do is to additionally supply the raw count matrix in adata.layer, I wrote a small section to describe how to do so (https://sctriangulate.readthedocs.io/en/latest/troubleshooting.html#compute-tfidf-on-integrated-expression-matrix-with-negative-value) such that when computing TFIDF, it will use the raw count data instead of the transformed data.

But I understand this may not be applicable in your case, since you are dealing with image data, so in your case, manually disabling TFIDF sounds like a reasonable choice. You mentioned removing TFIDF from sctri.metrics seems not to work, that's because by default the program still calculates TFIDF (https://github.com/frankligy/scTriangulate/blob/main/sctriangulate/main_class.py#L2943-L2945), but when calculating the Shapley values, the sctri.metrics will take effect and it will only look for metrics that in the list (https://github.com/frankligy/scTriangulate/blob/main/sctriangulate/main_class.py#L1306-L1317). I do admit I should design the program better instead of hardcoding the TFIDF computation. But I hope at least this is a temporary workaround.

I also want to emphasize, we can easily write customized stability metrics functions and pass them to the scTriangulate like signature below:

# define stability metric
def my_stability_metric(adata,key,arg1,arg2):
    # required arguments are adata and key (name of the annotation)
    # you can have as much as additional arguments as possible
    pass
    # must return a dictionary keyed by cluster name in each annotation (key), valued by the computed metric
    return cluster_to_metric

# pass to the program
sctri = ScTriangulate(dir,adata,query,add_metrics={'name_of_stability':my_stability_metric})
sctri.lazy_run(added_metrics_kwargs=[{'arg1':arg1_value,'arg2':arg2_value}])

Because after all, you are dealing with a new modality, so I think defining some modality-specific stability metric may be beneficial.

[3] Yes, I wrote a few spatial stability metrics (https://sctriangulate.readthedocs.io/en/latest/api.html#cluster-level-spatial-stability), for example, assortativity score can measure whether a cell type tend to mix with itself or other cell types, based on spatial information. The implementation is not that complicated (https://sctriangulate.readthedocs.io/en/latest/_modules/sctriangulate/spatial.html#cluster_level_spatial_stability) and mostly based on the squidy and networkx package, so should be easy to modify.

Also, I wrote a few spatial features to consider (https://sctriangulate.readthedocs.io/en/latest/api.html#create-spatial-features), the features can be either spatial coordinates, spatial-graph-related spatial features or image-related features (intensity, histogram, texture), just in case that might be of a bit help.

I used those features a few times and I haven't rigorously tested the spatial module so it is still labeled as "experimental", so I'd be appreciated any bug report or feedback.

Best, Frank

ChaDub commented 5 months ago

Hello,

I am trying to use sctriangulate in windows and I have the same problem as mentioned in this issue even when I install the package using the specific commit.

Is there any update for a solution ?

Thank you in advance ! Best, Charlotte

frankligy commented 5 months ago

Hi @ChaDub,

I am not sure why because as my screenshot showed, I tested on my windows PC as well and I can generate all the expected output without any problem. If you use the dummy test file I provided (https://github.com/frankligy/scTriangulate/tree/main/test), would you still face the same issue?

And this issue is a very explicit issue, as the error suggest, due to the file path divider difference between windows and Unix system, so it shouldn't be that mysterious.

Any chance the env is not properly set up (so you are actually run the previous version instead of the particular commit)? I mean if you start a new conda env, using python3.7, and pip install the commit, would that by any chance solve the problem?

As last resort, any chance you may be able to switch to a linux system, if your input file is not small, I think people would typically use HPC cluster for this task as well.

Best, Frank

ChaDub commented 5 months ago

Hello,

Thank you very much for your answer !

I created a new conda env with python 3.7 and install the specific commit but I still have the issue + I also have the issue #23 and I can't install the pandas 1.5.3 (I have an error saying that this pandas version is not compatible with python 3.7).

I will try in a linux system !

Thanks again, Best, Charlotte

frankligy commented 5 months ago

Hi @ChaDub,

The issue #23 I think I tested in google colab which is by default python3.10, that's why I wrote pandas 1.5.3, I just checked the original env (https://github.com/frankligy/scTriangulate/blob/main/reproduce/sctri_new_env_py37_linux.yml), pandas 1.3.5 should be the right version.

Best, Frank

ChaDub commented 5 months ago

Hello,

I tested the package in a linux system (in a VM) and it worked !

Thank you for your answers,

Best, Charlotte