aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
162 stars 27 forks source link

Error in 'process_non_multiome_data' - imputed_acc_obj.feature_names contains regions not present in ACC_region_metadata #367

Open jesswhitts opened 2 months ago

jesswhitts commented 2 months ago

Hello,

When running the Snakemake pipeline, I get the following error:

[Wed Apr 24 18:08:33 2024] localrule prepare_GEX_ACC_non_multiome: input: ../../scATAC/cistopic_obj.pkl, ../../Lambo_AML12DX.h5ad output: ACC_GEX.h5mu jobid: 2 reason: Missing output files: ACC_GEX.h5mu resources: tmpdir=/tmp

[Wed Apr 24 18:10:06 2024] Finished job 8. 2 of 14 steps (14%) done Select jobs to execute... Traceback (most recent call last): File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/bin/scenicplus", line 8, in sys.exit(main()) ^^^^^^ File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main args.func(args) File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 44, in command_prepare_GEX_ACC prepare_GEX_ACC( File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 67, in prepare_GEX_ACC mdata = process_non_multiome_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/scenicplus/data_wrangling/adata_cistopic_wrangling.py", line 253, in process_non_multiome_data ACC_region_metadata_subset = ACC_region_metadata.loc[imputed_acc_obj.feature_names]


  File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1074, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1302, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1240, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1433, in _get_listlike_indexer
    keyarr, indexer = ax._get_indexer_strict(key, axis_name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6108, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/data/stemcell/jwhittle/mambaforge/envs/scenic-plus/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6171, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['chr1:37086561-37087061', 'chr1:64814657-64815157', 'chr1:73736773-73737273', 'chr1:78358590-78359090', 'chr1:106741463-106741963', 'chr1:187075602-187076102', 'chr1:193554206-193554706', 'chr1:238131712-238132212', 'chr1:239901459-239901959', 'chr2:23305412-23305912', 'chr2:125741785-125742285', 'chr2:142237826-142238326', 'chr2:156768311-156768811', 'chr2:171620054-171620554', 'chr2:185579485-185579985', 'chr2:190778009-190778509', 'chr2:193787389-193787889', 'chr2:211184597-211185097', 'chr2:221629435-221629935', 'chr3:34470645-34471145', 'chr3:40484293-40484793', 'chr3:49978465-49978965', 'chr3:146347999-146348499', 'chr3:162299401-162299901', 'chr3:189308729-189309229', 'chr4:12293354-12293854', 'chr4:12337267-12337767', 'chr4:54573019-54573519', 'chr4:64828810-64829310', 'chr4:121448612-121449112', 'chr4:137856559-137857059', 'chr4:158423056-158423556', 'chr4:167092547-167093047', 'chr4:171830429-171830929', 'chr5:3637514-3638014', 'chr5:12795576-12796076', 'chr5:19834934-19835434', 'chr5:39584818-39585318', 'chr5:84892053-84892553', 'chr5:97952347-97952847', 'chr5:101944857-101945357', 'chr5:117146880-117147380', 'chr5:117706635-117707135', 'chr5:124063491-124063991', 'chr5:144399601-144400101', 'chr6:57361586-57362086', 'chr6:57597435-57597935', 'chr6:68170362-68170862', 'chr6:74246398-74246898', 'chr6:78553193-78553693', 'chr6:82971268-82971768', 'chr6:101359646-101360146', 'chr6:102233197-102233697', 'chr7:41185202-41185702', 'chr7:68545705-68546205', 'chr7:87780715-87781215', 'chr7:95583896-95584396', 'chr7:115614946-115615446', 'chr7:121575028-121575528', 'chr7:122807487-122807987', 'chr7:123464525-123465025', 'chr7:133945681-133946181', 'chr7:137263035-137263535', 'chr7:146256010-146256510', 'chr7:146589061-146589561', 'chr8:72907359-72907859', 'chr8:74843204-74843704', 'chr8:75565150-75565650', 'chr8:113029808-113030308', 'chr8:122247001-122247501', 'chr8:123049172-123049672', 'chr9:13823099-13823599', 'chr9:14395700-14396200', 'chr9:25512413-25512913', 'chr9:44490585-44491085', 'chr9:64892446-64892946', 'chr9:103506732-103507232', 'chr9:125355044-125355544', 'chr10:25796621-25797121', 'chr10:27922457-27922957', 'chr10:37236374-37236874', 'chr10:85073396-85073896', 'chr11:16379459-16379959', 'chr11:22150404-22150904', 'chr11:91071803-91072303', 'chr11:98578831-98579331', 'chr11:103469245-103469745', 'chr11:107364005-107364505', 'chr11:127980302-127980802', 'chr12:72517188-72517688', 'chr12:130103223-130103723', 'chr13:22294952-22295452', 'chr13:61439262-61439762', 'chr13:74019643-74020143', 'chr14:20273333-20273833', 'chr14:34144055-34144555', 'chr14:44690064-44690564', 'chr15:33559309-33559809', 'chr15:37871817-37872317', 'chr15:62538813-62539313', 'chr15:87671879-87672379', 'chr15:92464673-92465173', 'chr15:94986242-94986742', 'chr16:31866156-31866656', 'chr16:73741801-73742301', 'chr16:75698618-75699118', 'chr16:78208897-78209397', 'chr17:14991013-14991513', 'chr17:60796244-60796744', 'chr18:7966242-7966742', 'chr18:54326006-54326506', 'chr18:69099519-69100019', 'chr19:44492034-44492534', 'chr20:15779821-15780321', 'chr21:18616130-18616630', 'chr21:19008465-19008965', 'chr21:40286659-40287159', 'chr22:30507942-30508442', 'chr22:34939917-34940417', 'chrX:11831182-11831682', 'chrX:14276988-14277488', 'chrX:15502834-15503334', 'chrX:21599026-21599526', 'chrX:22318868-22319368', 'chrX:22495131-22495631', 'chrX:35023865-35024365', 'chrX:43327116-43327616', 'chrX:43530699-43531199', 'chrX:62824205-62824705', 'chrX:76599278-76599778', 'chrX:83683738-83684238', 'chrX:85360792-85361292', 'chrX:89976692-89977192', 'chrX:90275083-90275583', 'chrX:92385166-92385666', 'chrX:95591919-95592419', 'chrX:95869455-95869955', 'chrX:100145183-100145683', 'chrX:100724557-100725057', 'chrX:121200386-121200886', 'chrX:139955653-139956153', 'chrX:147122587-147123087', 'chrY:12421296-12421796', 'chrY:15703506-15704006'] not in index"

After looking in a bit more detail, it seems the output file from impute_accessibility contains regions that are not present in my metadata file. Do you know how this might happen?

I have already ran the pipeline successfully on a different sample, so I don't know what the issue might be here...
SeppeDeWinter commented 2 months ago

Hi @jesswhitts

Hmm this should not be the case, not sure how this can happen. Would you be able to trace back where they got lost in the pycisTopic analysis?

Best,

Seppe

jesswhitts commented 2 months ago

Reading in cistopic object: cisTopic_obj.region_data: [382839 rows x 8 columns]

Run impute accessibility function: len(imputed_acc_obj.feature_names): 382167

Subset for common annotations step: cisTopic_obj.region_data: [382691 rows x 8 columns] len(imputed_acc_obj.feature_names): 382167

It looks like when we 'check which annotations are common and if necessary subset', there are some regions lost which aren't removed from the imputed accessibility object, could this be the problem?

Best, Jess

SeppeDeWinter commented 1 month ago

Hi @jesswhitts

What are the dimensions of


cisTopic_obj.selected_model.region_topic

S

jesswhitts commented 1 month ago

Hi @SeppeDeWinter

[382839 rows x 100 columns] when first reading the file After doing the subset, the 'selected_model' field becomes an empty list

jesswhitts commented 1 month ago

Still unsure about the cause of this error, but I've found a workaround in case anyone else comes across this. I select for common cell types in my GEX and ATAC datasets at the very beginning, and the pipeline now works fine. Thanks again for the interesting tool!