Ligand-Receptor script - Githubissues

yulianatan commented 3 years ago

looking forward to ligand-receptor script! we would like to run the interactions between:

Macrophages to all other cell types/subtypes
DCs to all other cell types
Epithelial to all other cell types

wflynny commented 3 years ago

Added some cellphonedb code in commit 54eba615. On Sumner, there is a conda env with cellphonedb installed. Do the following:

from code.data_prep import export_for_cellphonedb

export_for_cellphonedb(
    adata,
    "path/to/analysis/cellphone",
    celltype_key="celltype"
    partition_key="patient_id",
    partition_prefix="pat",
)

This should create the input data needed to run cellphonedb between all celltypes for each patient under /path/to/analysis/cellphone.

Then we can launch cellphonedb as cluster jobs, one for each patient, with

cd /path/to/analysis/cellphone
mkdir logs
for d in $(find * -type d -name "*-output"); do
    /path/to/scripts/run_cellphonedb.sbatch ${d%_*}
done

Then do load the data in, you can do

from code.data_prep import load_cellphonedb_results

all_results = load_cellphonedb_results(
    adata,
    "path/to/analysis/cellphone",
    partition_key="patient_id",
    partition_prefix="pat",
)

yulianatan commented 3 years ago

adata.raw = sc.AnnData(adata.layers["raw"], var=adata.var, obs=adata.obs),
export_for_cellphonedb(adata,
                      f"{main_dir}analysis/{sample_id}/cellphone/eutopic",
                      celltype_key="celltype_broad",
                      partition_key="Patient_id",
                      partition_prefix="PT")

and tried this too:

                      f"{main_dir}analysis/{sample_id}/cellphone/eutopic",
                      celltype_key="celltype_broad",
                      partition_key="Patient_id",
                      partition_prefix="PT",
                      use_raw = False,
                      layer= "raw")

same error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4856                 blocks = [make_block(values=blocks[0],
-> 4857                                      placement=slice(0, len(axes[0])))]
   4858 

/opt/conda/lib/python3.7/site-packages/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   3204 
-> 3205     return klass(values, ndim=ndim, placement=placement)
   3206 

/opt/conda/lib/python3.7/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim)
    124                 'Wrong number of items passed {val}, placement implies '
--> 125                 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
    126 

ValueError: Wrong number of items passed 26418, placement implies 4622

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-22-08f73d796835> in <module>
      3                       celltype_key="celltype_broad",
      4                       partition_key="Patient_id",
----> 5                       partition_prefix="PT")

/projects/robson-lab/research/endometriosis/notebooks/EC19001-EC20016/data_prep/cellphone.py in export_for_cellphonedb(adata, outdir, partition_key, partition_prefix, celltype_key, celltype_label, use_raw, layer)
     47             x[inds, :],
     48             index=rownames,
---> 49             columns=df.index
     50         )
     51 

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    377             else:
    378                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 379                                          copy=copy)
    380         elif isinstance(data, (list, types.GeneratorType)):
    381             if isinstance(data, types.GeneratorType):

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    534             values = maybe_infer_to_datetimelike(values)
    535 
--> 536         return create_block_manager_from_blocks([values], [columns, index])
    537 
    538     @property

/opt/conda/lib/python3.7/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4864         blocks = [getattr(b, 'values', b) for b in blocks]
   4865         tot_items = sum(b.shape[0] for b in blocks)
-> 4866         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   4867 
   4868 

/opt/conda/lib/python3.7/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4841         raise ValueError("Empty data passed with indices specified.")
   4842     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4843         passed, implied))
   4844 
   4845 

ValueError: Shape of passed values is (26418, 4622), indices imply (4622, 26418)

wflynny commented 3 years ago

@yulianatan Fix in 6eb79fd.

yulianatan commented 3 years ago

ugh.. the permission denied problem,

[tany@sumner-log1 eutopic]$ for d in $(find * -type d -name "*-output"); do
> ../../../../notebooks/EC19001-EC20016/scripts/run_cellphonedb.sbatch ${d%_*}
> done
-bash: ../../../../notebooks/EC19001-EC20016/scripts/run_cellphonedb.sbatch: Permission denied
-bash: ../../../../notebooks/EC19001-EC20016/scripts/run_cellphonedb.sbatch: Permission denied

wflynny commented 3 years ago

Just add a chmod +x ../../../../notebooks/EC19001-EC20016/scripts/run_cellphonedb.sbatch

yulianatan commented 3 years ago

some errors & warnings

../../../../notebooks/EC19001-EC20016/scripts/run_cellphonedb.sbatch: line 31: warning: here-document at line 24 delimited by end-of-file (wanted `EOF')
../../../../notebooks/EC19001-EC20016/scripts/run_cellphonedb.sbatch: line 27: cd: /analysis/cellphone: No such file or directory
/projects/robson-lab/software/sumner/miniconda3/envs/cpdb/lib/python3.6/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)
As there is no R environment set up, some functionalities will be disabled, e.g. plot
[ ][APP][20/01/21-14:45:39][WARNING] Latest local available version is `v2.0.0`, using it
[ ][APP][20/01/21-14:45:39][WARNING] User selected downloaded database `v2.0.0` is available, using it
[ ][CORE][20/01/21-14:45:40][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][20/01/21-14:45:40][INFO] Using custom database at /home/tany/.cpdb/releases/v2.0.0/cellphone.db
[ ][APP][20/01/21-14:45:40][INFO] Launching Method cpdb_statistical_analysis_local_method_launcher
[ ][APP][20/01/21-14:45:40][INFO] Launching Method _set_paths
[ ][APP][20/01/21-14:45:40][INFO] Launching Method _load_meta_counts
[ ][APP][20/01/21-14:45:40][ERROR] Can not read /projects/robson-lab/research/endometriosis/analysis/Endo-Tissue-EC19001-EC20016/cellphone/eutopic/PT04-output_metadata.txt

wflynny commented 3 years ago

My bad, the launch command should be:

cd /path/to/analysis/cellphone
mkdir logs
for d in $(find * -type d -name "*-output"); do
    sbatch /path/to/scripts/run_cellphonedb.sbatch ${d%_*}
done

However, also updated the script slightly.

yulianatan commented 3 years ago

where do I checked for the run information? I didn't see anything on squeue -u tany nor the logs directory (ran it few minutes ago, should I wait longer?)

yulianatan commented 3 years ago

from cellphone import load_cellphonedb_results

all_results = load_cellphonedb_results(
    adata,
    f"{main_dir}analysis/{sample_id}/cellphone/eutopic",
    partition_key="Patient_id",
    partition_prefix="PT",
    use_raw=False,
    layer="raw"
)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-93b416311606> in <module>
      7     partition_prefix="PT",
      8     use_raw=False,
----> 9     layer="raw"
     10 )

/projects/robson-lab/research/endometriosis/notebooks/EC19001-EC20016/data_prep/cellphone.py in load_cellphonedb_results(adata, results_dir, partition_key, partition_prefix, celltype_key, celltype_label, use_raw, layer)
     83         results.append(base_name)
     84 
---> 85     return pd.concat(results, axis=0)
     86 
     87 

/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    223                        keys=keys, levels=levels, names=names,
    224                        verify_integrity=verify_integrity,
--> 225                        copy=copy, sort=sort)
    226     return op.get_result()
    227 

/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    284                        ' only pd.Series, pd.DataFrame, and pd.Panel'
    285                        ' (deprecated) objs are valid'.format(type(obj)))
--> 286                 raise TypeError(msg)
    287 
    288             # consolidate

TypeError: cannot concatenate object of type "<class 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

wflynny commented 3 years ago

How is something like this for annotating and filtering the data from all the patients:

def annotate_cellphone_results(dataframe):
    assert "sample_type" in dataframe.columns
    dataframe["celltype_pair"] = dataframe.celltype_a + "->" + dataframe.celltype_b
    dataframe["samples_in_sampletype"] = dataframe.groupby("sample_type").partition_key.transform(lambda x: len(x.unique()))
    dataframe["n_samples_observed"] = dataframe.groupby(["sample_type", "celltype_pair", "interacting_pair"]).partition_key.transform(lambda x: len(x.unique()))
    dataframe["frac_samples_observed"] = dataframe.n_samples_observed / dataframe.samples_in_sampletype
    dataframe["n_celltype_pairs"] = dataframe.groupby(["sample_type", "interacting_pair"]).celltype_pair.transform(lambda x: len(x.unique()))
    dataframe["self_interaction"] = dataframe.celltype_a == dataframe.celltype_b

which you can use with

from cellphone import load_cellphonedb_results

all_results = load_cellphonedb_results(
    adata,
    f"{main_dir}analysis/{sample_id}/cellphone/eutopic",
    partition_key="Patient_id",
    partition_prefix="PT",
    use_raw=False,
    layer="raw"
)

# !!! add 'sample_type' grouping to all_results

annotate_cellphone_results(all_results)

This will create the following extra columns:

sample_type	celltype_pair	samples_in_sampletype	n_samples_observed	frac_samples_observed	n_celltype_pairs	self_interaction
ecp	CTL->cDC1 (CLEC9A)	8	1	0.125	6	False
ecp	B cells->B cells	8	2	0.250	6	True
ecp	MΦΦ1 (tissue-resident)->plasma cell	8	1	0.125	7	False
ecp	moDC (CD1C-)->B cells	8	1	0.125	6	False
ecp	MΦΦ1 (tissue-resident)->MΦΦ1 (tissue-r...	8	1	0.125	1	True

where

sample_type: the 5 samples types you added
celltype_pair: a single column for the celltype pair (note that a->b is distinct from b->a)
samples_in_sampletype: total number of samples for each of the 5 sample types
n_samples_observed: number of samples in which a given (sample_type, celltype_pair, interacting_pair) is present
frac_samples_observed: n_sample_observed / samples_in_sampletypes
n_celltype_pairs: the number of celltype pairs (in a sample_type) have this interacting pair (can be as many as ~500)
self_interaction: whether celltype_a == celltype_b



print(a.iloc[:5, -7:])

all_results.groupby(["sample_type", "samples_in_sampletype"]).size()
# sample_type  samples_in_sampletype
# control      3                         40511
# eco          2                         32290
# ecp          8                        112423
# ecpa         7                         69237
# eue          9                        143076
# dtype: int64

all_results[
    (all_results.pval < 0.01) & \
    (all_results.frac_samples_observed > 0.9) & \
    (~all_results.self_interaction) & \
    (all_results.n_celltype_pairs < 10)
].shape
# (34, 22)

TheJacksonLaboratory / endometriosis-scrnaseq

Ligand-Receptor script #3