Closed Baudicm closed 7 months ago
Hi.
I think the issue is due to the different naming of the barcodes of GEX and ATAC.
Hi, thank you for your reply. Yes but I don't understand because scenic found cells in common : it is written "2024-04-12 10:40:08,555 Ingesting multiome data INFO Found 23958 multiome cells" An when I run : len(set(cistopic_obj.cell_names) & set(adata.obs_names)), it also give me "23958"
I don't really understand what "[columns]" means for in the Snakemake error.
Thank you again for you help
Hi @Baudicm
Seems like the error is thrown while imputing accessibility.
Could you show?
cistopic_obj.selected_model.cell_topic
Best,
Seppe
TACCGCAAGTGAGGGT-1-WT_AF___WT_AF CTAAATGTCCCGTTGT-1-WT_AF___WT_AF CCGTTTGGTCATCCTG-1-WT_AF___WT_AF GAGTATCTCCTGGTGA-1-WT_AF___WT_AF GTATTGATCCCGCCTA-1-WT_AF___WT_AF GTCCTCCCAGGGAGGA-1-WT_AF___WT_AF TCTCGCCCAAATACCT-1-WT_AF___WT_AF GAGCTAGCAGGCTTGT-1-WT_AF___WT_AF AATAGCTGTGCACGCA-1-WT_AF___WT_AF TGAGCTTAGTAAGTGG-1-WT_AF___WT_AF ... ACACAATGTACTTAGG-1-Mut_Midbrain___Mut_Midbrain GGATGAATCCGCATGA-1-Mut_Midbrain___Mut_Midbrain CGCAAATTCCTAGTAA-1-Mut_Midbrain___Mut_Midbrain CCTCAGTTCCACCTGT-1-Mut_Midbrain___Mut_Midbrain GTCCGTAAGGTTACAC-1-Mut_Midbrain___Mut_Midbrain ACTTAGGGTTGGTTCT-1-Mut_Midbrain___Mut_Midbrain CCATAAATCTGGCAAT-1-Mut_Midbrain___Mut_Midbrain CTATGAGGTAGCTGGT-1-Mut_Midbrain___Mut_Midbrain GGCAAGCCATTAGGCC-1-Mut_Midbrain___Mut_Midbrain TCCATCATCATGGCCA-1-Mut_Midbrain___Mut_Midbrain
Topic1 0.003502 0.022933 0.005403 0.009163 0.013929 0.011268 0.011032 0.025990 0.003867 0.010575 ... 0.010375 0.010649 0.009345 0.007629 0.001756 0.003436 0.014850 0.006405 0.017696 0.040800
Topic2 0.019162 0.001339 0.026155 0.002188 0.011959 0.010905 0.042464 0.009409 0.020540 0.014839 ... 0.003031 0.007922 0.008207 0.041007 0.004828 0.028875 0.019685 0.011724 0.010599 0.003583
Topic3 0.008268 0.004039 0.000606 0.000672 0.003095 0.003196 0.009868 0.008211 0.005353 0.002845 ... 0.003031 0.007013 0.033805 0.004291 0.012508 0.060790 0.024519 0.005645 0.002212 0.020573
Topic4 0.018027 0.007164 0.005849 0.002643 0.008020 0.047466 0.140640 0.029275 0.004693 0.020478 ... 0.013231 0.018831 0.008776 0.013637 0.006876 0.006211 0.007598 0.017803 0.004147 0.048081
Topic5 0.004863 0.083311 0.095328 0.109239 0.098302 0.058527 0.004823 0.076284 0.058836 0.099646 ... 0.005071 0.016558 0.006501 0.016307 0.001756 0.017775 0.010015 0.002605 0.002212 0.004392
Topic6 0.056838 0.021087 0.036643 0.001885 0.032642 0.041629 0.016852 0.057462 0.071051 0.063583 ... 0.025879 0.021558 0.013327 0.024986 0.016604 0.003436 0.077702 0.015523 0.017051 0.034327
Topic7 0.009403 0.014836 0.036420 0.016896 ...
Yes correct, the first row doesnt show the same name than cistopic_obj.cell_names. Thank you very much
Ok, then something went wrong with your cistopic object. Not sure how this could have occured, but simply renaming the columns so they are in the same format as the .cell_names
should fix your issue.
All the best,
Seppe
Yes, It fixed the issue. Thank you very much, Best Manon
No worries!
Good luck with the analysis!
Best,
Seppe
Hi, Thank you for this very useful tool, I would like to run again my data with the new optimized version using snakemake but I have an error related to the index of my cistopic_obj that I don't really understand :
Do you know why it doesn't find the indexes in the cystopic_obj while it find index in common between Anndata and cystopic object ?
Thank you very much for your help,
Manon
Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job stats: job count
AUCell_direct 1 AUCell_extended 1 all 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 prepare_GEX_ACC_multiome 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 11
Select jobs to execute... Execute 1 jobs...
[Fri Apr 12 10:39:08 2024] localrule prepare_GEX_ACC_multiome: input: /data/GSRunit/Manon/Multiome/scenicplus/cistopic_obj35model.pkl, /data/GSRunit/Manon/Multiome/scenicplus/adata_all_conditions.h5ad output: /data/GSRunit/Manon/Multiome/scenicplus/outs/ACC_GEX.h5mu jobid: 2 reason: Missing output files: /data/GSRunit/Manon/Multiome/scenicplus/outs/ACC_GEX.h5mu resources: tmpdir=/tmp
2024-04-12 10:39:54,195 SCENIC+ INFO Reading cisTopic object. 2024-04-12 10:39:57,654 rpy2.situation INFO cffi mode is CFFI_MODE.ANY 2024-04-12 10:39:57,727 rpy2.situation INFO R home found: /usr/local/apps/R/4.3/4.3.2/lib64/R 2024-04-12 10:39:57,907 rpy2.situation INFO R library path: /usr/local/apps/R/4.3/4.3.2/lib64/R/lib:/usr/local/java/jdk-18.0.1.1/lib/server:/usr/local/libtiff/4.3.0-gcc-8.5.0/lib:/usr/local/intel/2022.1.2.146/mkl/2022.0.2/lib/intel64:/usr/local/pcre2/10.40/gcc-11.3.0/lib:/usr/local/apps/PMIx/pmix-3.2.3/lib:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib/openmpi:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0/plugin:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0:/usr/local/GCC/11.3.0/lib64:/usr/local/GCC/11.3.0/lib:/usr/local/netcdf/4.9.0/gcc-11.3.0/lib:/usr/local/HDF5/1.12.2/lib 2024-04-12 10:39:57,907 rpy2.situation INFO LD_LIBRARY_PATH: /usr/local/apps/R/4.3/4.3.2/lib64/R/lib:/usr/local/java/jdk-18.0.1.1/lib/server:/usr/local/libtiff/4.3.0-gcc-8.5.0/lib:/usr/local/intel/2022.1.2.146/mkl/2022.0.2/lib/intel64:/usr/local/pcre2/10.40/gcc-11.3.0/lib:/usr/local/apps/PMIx/pmix-3.2.3/lib:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib/openmpi:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0/plugin:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0:/usr/local/GCC/11.3.0/lib64:/usr/local/GCC/11.3.0/lib:/usr/local/netcdf/4.9.0/gcc-11.3.0/lib:/usr/local/HDF5/1.12.2/lib 2024-04-12 10:39:57,957 rpy2.rinterface_lib.embedded INFO Default options to initialize R: rpy2, --quiet, --no-save 2024-04-12 10:39:58,373 rpy2.rinterface_lib.embedded INFO R is already initialized. No need to initialize. 2024-04-12 10:40:05,535 SCENIC+ INFO Reading gene expression AnnData. 2024-04-12 10:40:08,555 Ingesting multiome data INFO Found 23958 multiome cells. Traceback (most recent call last): File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/bin/scenicplus", line 8, in
sys.exit(main())
^^^^^^
File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
args.func(args)
File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 44, in command_prepare_GEX_ACC
prepare_GEX_ACC(
File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 61, in prepare_GEX_ACC
mdata = process_multiome_data(
^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/data_wrangling/adata_cistopic_wrangling.py", line 44, in process_multiome_data
imputed_acc_obj = impute_accessibility(
^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pycisTopic/diff_features.py", line 374, in impute_accessibility
cell_topic = model.cell_topic.loc[:, cell_names]