aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
165 stars 27 forks source link

create_SCENICPLUS_object() metadata_cell keyError #260

Open alexlenail opened 7 months ago

alexlenail commented 7 months ago

I'm getting an error in a late step of run_scenicplus():

In this section of run_scenicplus()

https://github.com/aertslab/scenicplus/blob/e4bdd9f5b7fea1a43d5cb55e2ccb3c221fe7d279/src/scenicplus/wrappers/run_scenicplus.py#L253

In the generate_pseudobulks() call, I get the error: KeyError: 'subclass_label_2' from this line:

  File "/home/ec2-user/scenicplus/src/scenicplus/cistromes.py", line 215, in generate_pseudobulks
    categories = list(set(cell_data.loc[:, variable]))

cell data is previously set to

https://github.com/aertslab/scenicplus/blob/e4bdd9f5b7fea1a43d5cb55e2ccb3c221fe7d279/src/scenicplus/cistromes.py#L210-L210

Which I believe is set earlier in the create_SCENICPLUS_object() call

https://github.com/aertslab/scenicplus/blob/e4bdd9f5b7fea1a43d5cb55e2ccb3c221fe7d279/src/scenicplus/scenicplus_class.py#L504

https://github.com/aertslab/scenicplus/blob/e4bdd9f5b7fea1a43d5cb55e2ccb3c221fe7d279/src/scenicplus/scenicplus_class.py#L577-L578

https://github.com/aertslab/scenicplus/blob/e4bdd9f5b7fea1a43d5cb55e2ccb3c221fe7d279/src/scenicplus/scenicplus_class.py#L581-L586

My conclusion is that although my adata object which I pass into create_SCENICPLUS_object() has a column in the obs matrix called 'subclass_label_2', the scenicplus_obj. metadata_cell dataframe does not have such a column.

Ideally, the create_SCENICPLUS_object() would have a verbose option to describe the code path it's taking, and any inconsistencies it's finding.

In the meantime, is there a way to repair the scenicplus object in place and complete the analysis? Or are all previous steps in run_scenicplus unlikely to have worked if the cell_metadata df was incorrectly populated?

alexlenail commented 7 months ago

Further investigation shows that my scplus_obj.cell_metadata has a column called GEX_ subclass_label_2 but not subclass_label_2. The GEX_ prefix is assigned here:

https://github.com/aertslab/scenicplus/blob/e4bdd9f5b7fea1a43d5cb55e2ccb3c221fe7d279/src/scenicplus/scenicplus_class.py#L563-L566

Maybe I should have known this, and set run_scenicplus(variable='GEX_subclass_label_2', ...) ? But I feel the tutorial might be misleading in that case.

SeppeDeWinter commented 7 months ago

Hi @alexlenail

You are right this should be documented better.

However, all previous steps of the SCENIC+ analysis should still be fine. It's only in this step that the cell type annotation is used so don't worry that you have to rerun the analysis.

All the best,

Seppe