aertslab / pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Other
58 stars 12 forks source link

pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects #171

Open Quentin-bioinfo opened 2 months ago

Quentin-bioinfo commented 2 months ago

Describe the bug After creating the citopic objects for each samples and merge it, i want to use the function cistopic_obj.add_cell_data(cell_data, split_pattern='-').

I get an error saying "Reindexing only valid with uniquely valued Index objects".

But there is no duplicated index in my cell data annotation. when I did the pre processing of the scRNA seq part I used: matrix.obs_names_make_unique() and also try to add : matrix.obsnames = [f"{idx}{sample}" for idx in matrix.obs_names] before to concat all the samples together to create the object I used go clusterize and generate the cell data annotation.

I don't see where that error could come from ?

To Reproduce


cistopic_obj_list = []
for sample_id in fragments_dict:
    sample_metrics = pl.read_parquet(
        os.path.join(pycistopic_qc_output_dir, f'{sample_id}.fragments_stats_per_cb.parquet')
    ).to_pandas().set_index("CB").loc[ sample_id_to_barcodes_passing_filters[sample_id] ]
    cistopic_obj = create_cistopic_object_from_fragments(
        path_to_fragments = fragments_dict[sample_id],
        path_to_regions = path_to_regions,
        path_to_blacklist = path_to_blacklist,
        metrics = sample_metrics,
        valid_bc = sample_id_to_barcodes_passing_filters[sample_id],
        n_cpu = 1,
        project = sample_id,
        split_pattern = '-'
    )
    cistopic_obj_list.append(cistopic_obj)

cistopic_obj = cistopic_obj_list[0]
cistopic_obj.merge(cistopic_obj_list[1:])

import pickle
pickle.dump(
    cistopic_obj,
    open(os.path.join(out_dir, "cistopic_obj.pkl"), "wb")
)

import pandas as pd
cell_data = pd.read_csv('../Data/scanpy/cell_annotation_data.csv', index_col = 0)
cistopic_obj.add_cell_data(cell_data, split_pattern='-')
pickle.dump(
    cistopic_obj,
    open(os.path.join(out_dir, "cistopic_obj.pkl"), "wb")
)

Error output

Traceback (most recent call last):
  File "Notebook/pycistonic.py", line 77, in <module>
    cistopic_obj.add_cell_data(cell_data, split_pattern='-')
  File "/home/user/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pycisTopic/cistopic_class.py", line 136, in add_cell_data
    new_cell_data = pd.concat([obj_cell_data, cell_data], axis=1, sort=False)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/util/_decorators.py", line 317, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 382, in concat
    return op.get_result()
           ^^^^^^^^^^^^^^^
  File "/home/uswer/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 613, in get_result
    indexers[ax] = obj_labels.get_indexer(new_labels)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3902, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
yojetsharma commented 1 month ago

I posted about a similar issue. The work around of that was omitting split_pattern Unfortunately, that leads to NaNs in my sample _id and cell _type column.

yojetsharma commented 1 month ago

Could this be due to the pandas 2.0?

Quentin-bioinfo commented 1 month ago

I think that the tuto isn't clear for that part. But i was able to move forward by adding the metadata to the individual cystopic object and only then merge it.

yojetsharma commented 1 month ago

Can you share the code how you did it?