Open Evenlyeven opened 1 year ago
Can you give me a sense of how large the cell type labels are? It would be great if you could show me the number of cells assigned to each label.
Here's tables showing number of cells assigned to each label.
Species zf:
Species pf:
Another question is, would it be the best if the input cell number of different species are comparable? I am working with 200 cells of one species and 8,000 cells of another species, was thinking about downsampling the 8,000 one.
Thank you!!
I think SAMap can be robust to dataset size disparities, but I would encourage you to try downsampling and check if the results change. I would also encourage changing the (poorly documented) NHS
parameter in SAMAP.run like so:
NHS = {'small_dataset_id': 2, 'big_dataset_id': 3}
NHS
controls neighborhood size. 3 means that a cell's neighborhood includes cells up to 3 edges away. 2 decreases the neighborhood size, which is probably good for smaller datasets.
Instead of using keys
in SAMAP(...)
,
Can you try using neigh_from_keys
in SAMAP.run(...)
? You can pass it the same exact value as you're passing to keys
.
If you use neigh_from_keys
, then NHS
is not needed.
Thanks a lot for your suggestions, I will try it.
Thanks for the useful tool!
I noticed that in my results, some areas look like solid lines (for example the cluster at the top in the screenshot below) in the UMAP. I wonder if this is due to that SAM run was set to neighborhood size determined by using cell type labels provided by myself. Does this look normal to you?
And when I check the UMAPs before SAMap stitch them together, they both look "normal" to me. sam1:
sam2:
Also, in my test run, where I didn't use cell type lablels to determine neighborhood size, hopping along each cell's outgoing edges was used instead. The UMAP looks more "normal" to me.
Any comments or suggestions will be highly appreciated!
The script I used is attached below (paths were replaced by ...):
from samap.mapping import SAMAP from samap.analysis import (get_mapping_scores, GenePairFinder, sankey_plot, chord_plot, CellTypeTriangles, ParalogSubstitutions, FunctionalEnrichment, convert_eggnog_to_homologs, GeneTriangles) from samalg import SAM import pandas as pd import anndata from joblib import dump, load
zf_data = anndata.read_h5ad('....') pf_data = anndata.read_h5ad('....')
sam1 = SAM(counts = zf_data) sam1.preprocess_data(filter_genes = False) sam1.run(batch_key = 'orig.ident', npcs = 30)
sam2 = SAM(counts = pf_data) sam2.preprocess_data(filter_genes = False) sam2.run(npcs = 20)
sams = {'zf': sam1, 'pf': sam2}
sm = SAMAP(sams, keys = {'zf': 'cell_type', 'pf': 'cell_type'}, f_maps = '...', save_processed = True)
Thanks very much in advance!
Di