Teichlab / cellphonedb

MIT License
339 stars 105 forks source link

User generate database fails if file extension is .txt instead of .csv #235

Closed racng closed 3 years ago

racng commented 3 years ago

I have a filtered version of interactions list downloaded from cellphonedb website, saved as comma separated text file, with '.txt' file extension. The cellphone database generate command fails because of missing "partner_a" column, even though it is in the interaction text file. I suspected this was a text file parsing issue, so I changed the file extension of the text file to .csv and it ran successfully. While this is not really a bug, it was not obvious to user that '.txt' files are read differently than '.csv' files.

cellphonedb database generate --user-interactions interaction_input.txt --result-path /path/cellphonedb-user --user-interactions-only
/home/rng/docs/git/single-cell-cellphonedb/conda/1c9287a2/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)
As there is no R environment set up, some functionalities will be disabled, e.g. plot
read local uniprot file
[ ][APP][01/12/20-11:43:48][WARNING] Output directory (/path/cellphonedb-user) exist and is not empty. Result can overwrite old results
[ ][APP][01/12/20-11:43:48][WARNING] Output directory (/path/cellphonedb-user) exist and is not empty. Result can overwrite old results
read local ensembl file
read local uniprot file
/home/rng/docs/git/single-cell-cellphonedb/conda/1c9287a2/lib/python3.7/site-packages/cellphonedb/src/core/generators/gene_generator.py:31: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  ensembl_db_filtered.dropna(inplace=True)
[ ][APP][01/12/20-11:43:59][WARNING] Output directory (/path/cellphonedb-user) exist and is not empty. Result can overwrite old results
[ ][APP][01/12/20-11:43:59][WARNING] Output directory (/path/cellphonedb-user) exist and is not empty. Result can overwrite old results
[ ][APP][01/12/20-11:43:59][WARNING] Output directory (/path/cellphonedb-user) exist and is not empty. Result can overwrite old results
[ ][APP][01/12/20-11:43:59][WARNING] Output directory (/path/cellphonedb-user) exist and is not empty. Result can overwrite old results
Traceback (most recent call last):
  File "/home/rng/docs/git/single-cell-cellphonedb/conda/1c9287a2/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'partner_a'

Here is my environment

conda list
# packages in environment at /home/rng/docs/git/single-cell-cellphonedb/conda/1c9287a2:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
aniso8601                 8.0.0                    pypi_0    pypi
attrs                     20.3.0                   pypi_0    pypi
blas                      1.0                         mkl
boto3                     1.7.84                   pypi_0    pypi
botocore                  1.10.84                  pypi_0    pypi
ca-certificates           2020.10.14                    0
cellphonedb               2.1.4                    pypi_0    pypi
certifi                   2020.6.20          pyhd3eb1b0_3
cffi                      1.14.3                   pypi_0    pypi
chardet                   3.0.4                    pypi_0    pypi
click                     6.7                      pypi_0    pypi
cython                    0.29.21                  pypi_0    pypi
docutils                  0.16                     pypi_0    pypi
fbpca                     1.0                      pypi_0    pypi
flask                     1.0.4                    pypi_0    pypi
flask-restful             0.3.8                    pypi_0    pypi
flask-testing             0.7.1                    pypi_0    pypi
geosketch                 0.3                      pypi_0    pypi
idna                      2.7                      pypi_0    pypi
importlib-metadata        3.0.0                    pypi_0    pypi
iniconfig                 1.1.1                    pypi_0    pypi
intel-openmp              2020.2                      254
itsdangerous              1.1.0                    pypi_0    pypi
jinja2                    2.11.2                   pypi_0    pypi
jmespath                  0.10.0                   pypi_0    pypi
joblib                    0.17.0                   pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7
libedit                   3.1.20191231         h14c3975_1
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.1.0                hdf63c60_0
libstdcxx-ng              9.1.0                hdf63c60_0
markupsafe                1.1.1                    pypi_0    pypi
mkl                       2020.2                      256
mkl-service               2.3.0            py37he904b0f_0
mkl_fft                   1.2.0            py37h23d657b_0
mkl_random                1.1.1            py37h0573a6f_0
ncurses                   6.2                  he6710b0_1
numpy                     1.19.2           py37h54aff64_0
numpy-base                1.19.2           py37hfa32c7d_0
openssl                   1.1.1h               h7b6447c_0
packaging                 20.4                     pypi_0    pypi
pandas                    0.23.4           py37h04863e7_0
pika                      0.12.0                   pypi_0    pypi
pip                       20.2.4           py37h06a4308_0
pluggy                    0.13.1                   pypi_0    pypi
py                        1.9.0                    pypi_0    pypi
pycparser                 2.20                     pypi_0    pypi
pyparsing                 2.4.7                    pypi_0    pypi
pytest                    6.1.2                    pypi_0    pypi
python                    3.7.9                h7579374_0
python-dateutil           2.8.1                      py_0
pytz                      2020.1                     py_0
pyyaml                    5.1.2                    pypi_0    pypi
readline                  8.0                  h7b6447c_0
requests                  2.19.1                   pypi_0    pypi
rpy2                      3.0.5                    pypi_0    pypi
s3transfer                0.1.13                   pypi_0    pypi
scikit-learn              0.23.2                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
setuptools                50.3.1           py37h06a4308_1
simplegeneric             0.8.1                    pypi_0    pypi
six                       1.15.0           py37h06a4308_0
sqlalchemy                1.3.20                   pypi_0    pypi
sqlite                    3.33.0               h62c20be_0
threadpoolctl             2.1.0                    pypi_0    pypi
tk                        8.6.10               hbc83047_0
toml                      0.10.2                   pypi_0    pypi
tqdm                      4.32.2                   pypi_0    pypi
urllib3                   1.23                     pypi_0    pypi
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.35.1             pyhd3eb1b0_0
xz                        5.2.5                h7b6447c_0
zipp                      3.4.0                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3
prete commented 3 years ago

Hi @racng thank you for your input. You're right it's not obvious and sometimes causes confusion, refer to #131, #179, #283 and #287.

CellPhoneDB .txt is synonym for .tsv or .tab, all being tab-separated values. The only input that's expected to be comma-separated are .csv files.