broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
295 stars 54 forks source link

Duplicated var_names in cellbender output #204

Closed auesro closed 1 year ago

auesro commented 1 year ago

Dear CellBender team,

I just started using your tool to remove ambient RNA from my 10x scRNAseq dataset and I found that when loading CellBender output: adata = anndata_from_h5(file='VMC3_cellbender.h5') in scanpy I get a warning: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`. utils.warn_names_duplicates("var") which is not present when loading the full raw matrix...is there any reason why CellBender would output duplicated gene names when cellranger doesnt?

I will mention I had to use the anndata_from_h5 function following the advice given in other issues due to a lack of the genome parameter.

Thanks

A

auesro commented 1 year ago

My bad!

The same var_names appear as duplicated in the original raw 10x output however read_10x_mtx will make vars unique by default!

sjfleming commented 1 year ago

Hi @auesro , I appreciate that you posted the answer here, as I'm sure other people will find it useful too! I did not know that read_10x_mtx made vars unique by default.

auesro commented 1 year ago

The least I can do after all the effort you put into CellBender!

Yea, I discovered it today:

make_unique : [bool](https://docs.python.org/3/library/functions.html#bool) (default: True)

    Whether to make the variables index unique by appending ‘-1’, ‘-2’ etc. or not.

Best option to work with gene_ids I guess...