Starlitnightly / omicverse

A python library for multi omics included bulk, single cell and spatial RNA-seq analysis.
https://starlitnightly.github.io/omicverse/
GNU General Public License v3.0
277 stars 32 forks source link

Code correction #26

Closed mugpeng closed 8 months ago

mugpeng commented 8 months ago

Code: omicverse/omicverse/single/_simba.py at master · Starlitnightly/omicverse

At the step of choosing the largest dataset as a reference: Batch correction (multiple batches) — SIMBA 1.2 documentation The author get the result as:

{'C': AnnData object with n_obs × n_vars = 8569 × 50,
 'C2': AnnData object with n_obs × n_vars = 2127 × 50,
 'C3': AnnData object with n_obs × n_vars = 2122 × 50,
 'C4': AnnData object with n_obs × n_vars = 457 × 50,
 'G': AnnData object with n_obs × n_vars = 7988 × 50,
 'C5': AnnData object with n_obs × n_vars = 1492 × 50}

Obviously C annData is the largest one, but it doesn't mean C is always the largest, for my data:

## dict_adata
{'C18': AnnData object with n_obs × n_vars = 3285 × 50,
 'C17': AnnData object with n_obs × n_vars = 761 × 50,
 'G': AnnData object with n_obs × n_vars = 3000 × 50,
 'C16': AnnData object with n_obs × n_vars = 2080 × 50,
 'C5': AnnData object with n_obs × n_vars = 1988 × 50,
 'C6': AnnData object with n_obs × n_vars = 1835 × 50,
 'C15': AnnData object with n_obs × n_vars = 597 × 50,
 'C13': AnnData object with n_obs × n_vars = 2418 × 50,
 'C7': AnnData object with n_obs × n_vars = 659 × 50,
 'C10': AnnData object with n_obs × n_vars = 3673 × 50,
 'C8': AnnData object with n_obs × n_vars = 523 × 50,
 'C20': AnnData object with n_obs × n_vars = 147 × 50,
 'C22': AnnData object with n_obs × n_vars = 1038 × 50,
 'C9': AnnData object with n_obs × n_vars = 2437 × 50,
 'C12': AnnData object with n_obs × n_vars = 1298 × 50,
 'C4': AnnData object with n_obs × n_vars = 1774 × 50,
 'C21': AnnData object with n_obs × n_vars = 1165 × 50,
 'C2': AnnData object with n_obs × n_vars = 1995 × 50,
 'C11': AnnData object with n_obs × n_vars = 3527 × 50,
 'C14': AnnData object with n_obs × n_vars = 1912 × 50,
 'C19': AnnData object with n_obs × n_vars = 436 × 50,
 'C3': AnnData object with n_obs × n_vars = 1379 × 50,
 'C': AnnData object with n_obs × n_vars = 497 × 50}

Thus the script should be:

batch_size_si = dict(zip(list(dict_adata.keys()),
                            [dict_adata[i].shape[0] for i in dict_adata.keys()]))
adata_ref = dict_adata[max(batch_size_si, key=batch_size_si.get)]
Starlitnightly commented 8 months ago

Thanks for the correction and the pull request!

Zehua