Open adamgayoso opened 2 years ago
Hi @adamgayoso, thanks a lot for these suggestions! I will regenerate the datasets for the website and notify here as soon as they are up. The jax speed up sounds really cool, and we'd be super happy for our data to be featured in the tutorials of course! I'd be very keen to peak at the first results, and of course I'd be very happy to help with interpretation or writing up vignettes if needed.
Thank you @emdann!! It would also be beneficial if the genes used to run the scvi model were included (in adata.var
) so I can better reproduce the results. Also, what batch_key
was used? I saw "bbk"
I think in the notebooks on this repo.
The batch key used is a concatenation of method
(10X protocol, 3' or 5') and donor
(see under "Add batch key " here)
The new version seems to give reasonably similar results to the old version which is good. I also noticed that Scanpy umap plotting of the celltype is not working for some reason.
The .h5ad objects for download should now be updated. Could I ask you to try again to check if the plotting problem persists?
Yes looks great! I still have an issue plotting
bdata.obs["celltype"] = np.array(list(bdata.obs.celltype_annotation))
sc.pl.embedding(bdata, basis="X_mde", color=['celltype'], frameon=False)
Plotting celltype_annotation
alone gives an error
Could it be there are too many categories for scanpy?
Notes from troubleshooting attempts:
Part of the problem could be the NaNs (https://github.com/theislab/scanpy/issues/2133): I found the maternal contaminants were not flagged correctly in this object, these are cells with adata.obs['celltype_annotation']
set to NaN. I will modify that in the file ASAP, but for now you can try filtering those out before plotting.
After filtering out nans I still get all gray, so it might indeed be a problem with scanpy trying to handle too many categories (and pandas update possibly?). Also setting groups
throws a pandas error.
I usually plot annotations by lineage, using the assignment saved here. The best workaround I can suggest for now is trying something like:
import json
with open('Pan_fetal_immune/metadata/anno_groups.json', 'r') as json_file:
anno_groups_dict = json.load(json_file)
adata.obs['annotation_plot'] = np.nan
lineage = 'B CELLS'
lineage_cells = adata.obs['celltype_annotation'].isin(anno_groups_dict[lineage])
adata.obs.loc[lineage_cells, 'annotation_plot'] = adata.obs.loc[lineage_cells, 'celltype_annotation'].copy()
sc.pl.umap(adata, color=['annotation_plot'], title=lineage)
This tells me I should probably save the annotation groups in adata.obs
...
Hello, this is such a cool project!
I was wondering if the compressed anndata objects could be shared on the website. For example, for the full dataset, saving like
write_h5ad(path, compression="gzip")
reduces the file size to ~5gb from 15gb. While it takes a bit longer to save with compression, reading is still pretty fast. I also noticed an issue withadata.obs["donor"]
where it's mixed string and float types, so also saving it withadata.obs["donor"] = adata.obs["donor"].astype(str)
would be appreciated.We are working on faster implementations of scvi-tools using jax. In this notebook we can process 150k cells in <5 minutes on Colab. I was hoping to create a new tutorial with your dataset to show that we can process 900k cells in < 1 hr (integration + visualization, all for free!).