Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
301 stars 46 forks source link

Cell Typist Providing different results between iterations #126

Open ManuelSokolov opened 3 months ago

ManuelSokolov commented 3 months ago

Hi! I am doing label transfer from reference dataset and classifying two query sets that should contain exactly same cell types. I noticed that running across several iterations the classifications would be different each iterations.

reference = sc.read_h5ad("data/combined_ref.h5ad")
query1 = sc.read_h5ad("querys/unnorm_sc_C32-24h.h5ad")
query2 = sc.read_h5ad("querys/unnorm_sc_C32-72h.h5ad")

sc.pp.normalize_total(query1, target_sum=1e4)
sc.pp.log1p(query1)

sc.pp.normalize_total(query2, target_sum=1e4)
sc.pp.log1p(query2)

sc.pp.normalize_total(reference, target_sum=1e4)
sc.pp.log1p(reference)

predictions24h = pd.DataFrame()
predictions72h = pd.DataFrame()
predictions24h['id'] = list(query1.obs_names)
predictions72h['id'] = list(query2.obs_names)

features =[]

for i in range(25):
    print(f"iteration{i}")
    model2 = celltypist.train(reference,labels = 'CellClass', n_jobs = 10, feature_selection = True)
    if i == 0:
        features = model2.features
    extracted = model2.features
    features = list(set(extracted) & set(features))  
    prediction_query1 = celltypist.annotate(query1, model = model2, majority_voting=True)
    prediction_query2 = celltypist.annotate(query2, model = model2, majority_voting=True)
    adata2_query1 = prediction_query1.to_adata()
    adata2_query2 = prediction_query2.to_adata()
    predictions24h[f'run{i}'] = list(prediction_query1.predicted_labels.majority_voting)
    predictions72h[f'run{i}'] = list(prediction_query2.predicted_labels.majority_voting)

As you can see in next plot I plotted for each sample (rows) the percentages of predicted cell types per sample (e.g for first sample in the graph, from the 25 iterations of cell types it got classifed 40% of the times as radial glia and 60% of the times as glioblast.

Captura de ecrã 2024-07-26, às 12 46 56

Is this behaviour expected/documented for cell typist ? What is recommended to do in this case?

Best Regards,

Manuel

ChuanXu1 commented 3 months ago

@ManuelSokolov, the training process involves various sources of randomness. For example, the first round of training uses SGD which will shuffle the data before each epoch starts and therefore create randomness. If you want to have a stable model, a better way is to increase the number of iterations during training (e.g., max_iter = 2000) at the cost of longer runtime.

ManuelSokolov commented 3 months ago

@ChuanXu1 thank you for your response, the SGD flag is by default set to False so the randomness should not exist. Is there any other reason that can be driving this randomness - disabling feature selection when training seem to have disabled randomness in the model. Also, my goal in addition to stability is to obtain correct results - a model that classifies wrongly with high confidence scores is not helpfull in this case (the UMAP below shows the result of one iteration)

Captura de ecrã 2024-07-28, às 22 23 03

If I disable feature selection the result will always be same:

Captura de ecrã 2024-07-28, às 22 24 14

However, since the results with and without feature selection seem to be completely different, I am not sure if I can trust the model - can you please comment on this?

ChuanXu1 commented 3 months ago

@ManuelSokolov, the first round of training always use SGD. use_SGD = False (the default) is intended for the 2nd round of training after feature selection.

ManuelSokolov commented 3 months ago

Sorry @ChuanXu1 you seem to have responded before I edited the response, disabling feature selection seemes to have stabilized the results however difficult to know what is right/wrong, please see message above

ChuanXu1 commented 3 months ago

@ManuelSokolov, it is usually recommended to use feature selection to speed up the run and increase the accuracy.

ManuelSokolov commented 3 months ago

In this case seems to be reducing accuracy by providing different results across iterations. I also looked into the annotate method and it does standard scalling before classifications, and this option cannot be set to false. What is your recommendation given this example?