Closed maxim-h closed 1 year ago
I thought at first that perhaps the error is coming from the ONCLASS algorithm
So I tried running without it:
annotate_data(adata, methods = ["knn_on_scvi", "scanvi", "knn_on_bbknn", "svm", "rf", "knn_on_scanorama", "celltypist",], save_path=f"{output_folder}/popv_output")
It did in fact run further along. But in the end still resulted in the same error:
Ok, the issue is quite mysterious, but probably came from a non-recommended installation method. Reinstalling everything strictly as recommended solved it.
Was it an error on Colab or locally? It was using the wrong codec, I guess. You can verify it by directly reading the cell ontology file.
obonet.read_obo(obofile)
. It looks to me like a problem with obonet.
The casting error is interesting. We can manually cast everything to float64 (I think this behavior has changed recently in scanpy). Was it also the case when installing as recommended or did you use a newer scanpy version?
If you set retrain to True, it is recommended to set hvg in Process_Query to 4000 to not run on all genes (lower memory usage).
I ran everything locally.
Yes, the casting is a problem even in properly installed version. Well, almost properly. As you see I use micromamba
instead of conda
.
When not adjusting the object beforehand this is the result of Process_Query
For the reference here is my environment:
Hi,
I've been trying to run the tutorial with my own data. The
Process_Query
was ran as follows:The main 2 modifications I had to make were:
new_query
here) while castingquery_adata.X
fromdtype=numpy.float64
todtype=numpy.float32
. Otherwise I got an error here from somewhere insideAnnData.concat
becausedtype
in query didn't matchdtype
in reference. Might file a separate issue about it later, but sure yet to whom.prediction_mode
to "retrain" because I had different set of features between the query and reference.Then once I got to this cell I got an error I don't understand
First I got some normal output:
Output
``` Found 20437 genes among all datasets [[0. 0.05625606 0.00932836 0.0749383 0.76862464 0.34284655 0.00278164 0.0206044 ] [0. 0. 0.90882638 0.01745878 0.01790831 0.03103783 0.83449235 0.03103783] [0. 0. 0. 0.11007463 0.03358209 0.04664179 0.71349096 0.19776119] [0. 0. 0. 0. 0.2987106 0.21571534 0.00139082 0.06506619] [0. 0. 0. 0. 0. 0.28581662 0.00556328 0.09670487] [0. 0. 0. 0. 0. 0. 0.05563282 0.23128243] [0. 0. 0. 0. 0. 0. 0. 0.10292072] [0. 0. 0. 0. 0. 0. 0. 0. ]] Processing datasets (1, 2) Processing datasets (1, 6) Processing datasets (0, 4) Processing datasets (2, 6) Processing datasets (0, 5) Processing datasets (3, 4) Processing datasets (4, 5) Processing datasets (5, 7) Processing datasets (3, 5) Processing datasets (2, 7) Processing datasets (2, 3) Processing datasets (6, 7) Epoch 87/87: 100%|██████████| 87/87 [09:45<00:00, 6.73s/it, loss=7.61e+03, v_num=1] ```But then some
UnicodeDecodeError
:Traceback:
```python --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) Cell In[55], line 3 1 from popv.annotation import annotate_data ----> 3 annotate_data(adata, save_path=f"{output_folder}/popv_output") File [prefix]/PopV/.venv/lib/python3.8/site-packages/popv/annotation.py:59, in annotate_data(adata, methods, save_path, methods_kwargs) 57 current_method = getattr(algorithms, method)(**methods_kwargs.pop(method, {})) 58 current_method.compute_integration(adata) ---> 59 current_method.predict(adata) 60 current_method.compute_embedding(adata) 61 all_prediction_keys += [current_method.result_key] File [prefix]/PopV/.venv/lib/python3.8/site-packages/popv/algorithms/_onclass.py:128, in ONCLASS.predict(self, adata) 125 cl_ontology_file = adata.uns["_cl_ontology_file"] 126 nlp_emb_file = adata.uns["_nlp_emb_file"] --> 128 celltype_dict, clid_2_name = self.make_celltype_to_cell_ontology_id_dict( 129 cl_obo_file 130 ) 131 self.make_cell_ontology_id(adata, celltype_dict, self.cell_ontology_obs_key) 133 train_model = OnClassModel( 134 cell_type_nlp_emb_file=nlp_emb_file, cell_type_network_file=cl_ontology_file 135 ) File [prefix]/PopV/.venv/lib/python3.8/site-packages/popv/algorithms/_onclass.py:66, in ONCLASS.make_celltype_to_cell_ontology_id_dict(self, cl_obo_file) 51 """ 52 Make celltype to ontology id dict and vice versa. 53 (...) 63 dictionary of ontology id to celltype names 64 """ 65 with open(cl_obo_file) as f: ---> 66 co = obonet.read_obo(f) 67 id2name = {id_: data.get("name") for id_, data in co.nodes(data=True)} 68 id2name = {k: v for k, v in id2name.items() if v is not None} File [prefix]/PopV/.venv/lib/python3.8/site-packages/obonet/read.py:30, in read_obo(path_or_file, ignore_obsolete) 13 """ 14 Return a networkx.MultiDiGraph of the ontology serialized by the 15 specified path or file. (...) 27 not be added to the graph. 28 """ 29 obo_file = open_read_file(path_or_file) ---> 30 typedefs, terms, instances, header = get_sections(obo_file) 31 obo_file.close() 33 if "ontology" in header: File [prefix]/PopV/.venv/lib/python3.8/site-packages/obonet/read.py:77, in get_sections(lines) 75 continue 76 stanza_type_line = next(stanza_lines) ---> 77 stanza_lines = list(stanza_lines) 78 if stanza_type_line.startswith("[Typedef]"): 79 typedef = parse_stanza(stanza_lines, typedef_tag_singularity) File [~]/.micromamba/envs/python3.8/lib/python3.8/encodings/ascii.py:26, in IncrementalDecoder.decode(self, input, final) 25 def decode(self, input, final=False): ---> 26 return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 7735: ordinal not in range(128) ```Any pointers on how to troubleshoot it?