YosefLab / PopV

MIT License
23 stars 8 forks source link

How to create pre-trained models #19

Closed ccruizm closed 1 year ago

ccruizm commented 1 year ago

Good day,

I would like to use your tools to compare predictions on my dataset but would like to create my own pre-trained reference. Could you please share how this can be done?

Thanks in advance!

canergen commented 1 year ago

Hi, Run the Colab notebook (either locally or on Colab) and change Process_Query call to (4000 hvg is a suggestion that works well in my hands but depends on the complexity of the cell-types in your dataset, you can increase it if it's necessary for other processing like PCA). If the cell-types in the reference are not named based on an ontology, use _cl_obofolder==False.

adata = Process_Query( query_adata, ref_adata, query_labels_key=query_labels_key, query_batch_key=query_batch_key, ref_labels_key=ref_labels_key, ref_batch_key=ref_batch_key, unknown_celltype_label=unknown_celltype_label, save_path_trained_models=output_model_fn, cl_obo_folder="./PopV/ontology/", prediction_mode="retrain", n_samples_per_label=n_samples_per_label, use_gpu=0, compute_embedding=True, hvg=4000, ).adata

ccruizm commented 1 year ago

Thank you very much for your speedy reply and the recommendations to run it in an independent reference dataset. Will test it on my data and see how it performs.

ccruizm commented 1 year ago

Good day!

I am trying to run the code as you suggested setting cl_obo_folder=False but I am getting this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[20], line 4
      1 # https://github.com/YosefLab/PopV/issues/19
      2 from popv.preprocessing import Process_Query
----> 4 adata = Process_Query(
      5     query_adata,
      6     ref_adata,
      7     query_labels_key=query_labels_key,
      8     query_batch_key=query_batch_key,
      9     ref_labels_key=ref_labels_key,
     10     ref_batch_key=ref_batch_key,
     11     unknown_celltype_label=unknown_celltype_label,
     12     save_path_trained_models=output_model_fn,
     13     # cl_obo_folder="./PopV/ontology/",
     14     cl_obo_folder=False,
     15     prediction_mode="retrain",
     16     n_samples_per_label=n_samples_per_label,
     17     use_gpu=0,
     18     compute_embedding=True,
     19     hvg=5000,
     20 ).adata

File ~/miniconda3/envs/popv/lib/python3.8/site-packages/popv/preprocessing.py:197, in Process_Query.__init__(self, query_adata, ref_adata, ref_labels_key, ref_batch_key, query_labels_key, query_batch_key, query_layers_key, prediction_mode, cl_obo_folder, unknown_celltype_label, n_samples_per_label, pretrained_scvi_path, save_path_trained_models, hvg, use_gpu, compute_embedding, return_probabilities)
    195     self.nlp_emb_file = cl_obo_folder + "cl.ontology.nlp.emb"
    196 try:
--> 197     with open(self.cl_obo_file) if self.cl_obo_file else True:
    198         pass
    199 except FileNotFoundError:

AttributeError: __enter__

I created a dedicated conda env to install PopV, and I am using the latest version (0.2.2). What do you think the problem is?

Thanks in advance!

canergen commented 1 year ago

Thanks for the report. It's fixed in v0.3.1 (uploaded to PyPI). Additionally, PopV now includes harmony-pytorch integration and a faster version of KNN.