Open leahyekim opened 5 months ago
Hi Leah
Thank you for the kind words!
I'm not sure what is going on here. Could you please show cistopic_obj.cell_data
Best,
Seppe
Hi Seppe,
Sorry for the delayed response!
My modified cistopic_obj.cell_data looks like this below (I only screenshot the part of it) :
In the "Seurat_cell_type" column, I had to replace "NaN" to "Unknown." In the "sample_id" column, I also had to replace "NaN" to "P1-coc" to match other cells. Notice that some "Unknown" in the "Seurat_cell_type" column have "Unknown" values in the "pycisTopic_leiden10_0.x" column - some do not.
Only after making these adjustments, I was finally able to run this:
`annot_dict = {}
for resolution in [0.6, 1.2, 3]:
annot_dict[f"pycisTopic_leiden_10_{resolution}"] = {}
for cluster in set(cistopic_obj.cell_data[f"pycisTopic_leiden_10_{resolution}"]):
counts = cistopic_obj.cell_data.loc[
cistopic_obj.cell_data.loc[cistopic_obj.cell_data[f"pycisTopic_leiden_10_{resolution}"] == cluster].index,
"Seurat_cell_type"].value_counts()
annot_dict[f"pycisTopic_leiden_10_{resolution}"][cluster] = f"{counts.index[counts.argmax()]}({cluster})"`
At the end, this led me to have "Unknown.bed" file in "outs/region_sets/DARs_cell_type" directory.
However, I did not save this "modified" cistopic.obj. So the snakemake pipeline will get the original cistopic_obj that has "NaN" values instead of "Unknown."
Would this be okay with running the snakemake pipeline? OR is there any other way to run the code without replacing "NaN" values?
Hi!
I think this should be OK with the Snakemake pipeline.
Best,
Seppe
Hi Seppe,
Sorry for the delayed response and thank you so much for your reply. I was able to run through the Snakemake pipeline :)
I'm wondering if I could avoid having those "NaN" values. My current workflow looks like this: Analyze 10x multiome data in Seurat --> identify each cluster in Seurat (because I'm more familiar with Seurat than Scanpy & your sample "cell_data.tsv" file had "Seurat_cell_type" as a column name!) --> Export the cell barcodes and their corresponding cluster ID as "cell_data.tsv" --> Use this "cell_data.tsv" to annotate my cistopic object.
Instead of using Seurat, would identifying each cluster with Scanpy - then exporting "cell_data.tsv" from the Scanpy object help me avoid having "NaN" values in the cistopic object?
Or am I overcomplicating this and having "NaN" values is not a problem?
Best, Leah
Follwoing. I cant filter these "NaN" values.
Hi @leahyekim
I'm not entirely sure where you are getting NaN's. I would suggest manually inspecting all steps to see where these are generated. I guess some cell barcodes in your cistopic object and seurat object are not matching.
All the best,
Seppe
Hello! Thank you so much for developing & maintaining a great tool :)
Could you please help me figure out why I had to replace NaN values for "Seurat_cell_type" to "Unknown" ?
Following the pycisTopic documentation, I tried annotating each cluster of cistopic_obj based on scRNA-seq annotations.
I ran this:
Then I got this error:
I thought this was due to having NaN values in my cistopic_obj. However, from your SCENIC+ seminar (timestamp included), it was mentioned that having NaN values should not be a problem.
When I checked "annot_dict", I noticed that the clusters from pycisTopic were not labelled with my scRNA-seq annotations as shown below:
I was expecting to see the whole list but this is what I only got.
Then I manually replaced NaN values in the "sample_id" column just like other cells (I only have one sample), but that gave me the same error as above.
_Alternatively, I also tried replacing the NaN values in "Seurat_cell_type' with "Unknown" by using this code:_
This worked, and some pycisTopic clusters was labelled as "Unknown." However, I noticed that I did not have anything in the "Seurat_cell_type" column" for some topic in "topic_annot."
Is this normal? OR is this due to setting some clusters as "Unknown?"
Also, when I printed "Number of DARs found," I didn't see any "Unknown" clusters like below:
This goes same as "Number of DAGs found."
Despite several hiccups, I was able to run pycisTopic all the way. _Would this be okay for the downstream steps? What could be the reason why I had to replace the NaN values from the "Seurat_celltype" column to "Unknown?"
Thank you so much!
Best, Leah
Version (please complete the following information): Python: 3.11.4 pycisTopic: 2.0a0 SCENIC+: 1.0a1