User generate database fails if file extension is .txt instead of .csv #235

Closed racng closed 3 years ago

racng commented 3 years ago

I have a filtered version of interactions list downloaded from cellphonedb website, saved as comma separated text file, with '.txt' file extension. The cellphone database generate command fails because of missing "partner_a" column, even though it is in the interaction text file. I suspected this was a text file parsing issue, so I changed the file extension of the text file to .csv and it ran successfully. While this is not really a bug, it was not obvious to user that '.txt' files are read differently than '.csv' files.

cellphonedb database generate --user-interactions interaction_input.txt --result-path /path/cellphonedb-user --user-interactions-only
/home/rng/docs/git/single-cell-cellphonedb/conda/1c9287a2/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)
As there is no R environment set up, some functionalities will be disabled, e.g. plot
read local uniprot file
read local ensembl file
read local uniprot file
/home/rng/docs/git/single-cell-cellphonedb/conda/1c9287a2/lib/python3.7/site-packages/cellphonedb/src/core/generators/gene_generator.py:31: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Traceback (most recent call last):
  File "/home/rng/docs/git/single-cell-cellphonedb/conda/1c9287a2/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'partner_a'

Here is my environment

conda list
prete commented 3 years ago

Hi @racng thank you for your input. You're right it's not obvious and sometimes causes confusion, refer to #131, #179, #283 and #287.

CellPhoneDB .txt is synonym for .tsv or .tab, all being tab-separated values. The only input that's expected to be comma-separated are .csv files.