KrishnaswamyLab / scprep

A collection of scripts and tools for loading, processing, and handling single cell data.
MIT License
72 stars 21 forks source link

`scprep.select.columns` should return a series if only one column is selected #95

Closed dburkhardt closed 4 years ago

dburkhardt commented 4 years ago

Is your feature request related to a problem? Please describe. Currently, selecting a single column from a dataframe using scprep.select.columns (e.g. using exact_word=...) returns a dataframe with shape (n, 1). This makes it difficult to use the column as a boolean mask or pass to plt.hist.

Describe the solution you'd like If a single columns is selected, return a series instead of a columns. It wouldn't be a bad idea to generate warning text if multiple columns are selected using exact_word because that's usually used for when I have the exact gene symbol but am using data loaded with gene_id='both'. In this use case I expect a single column match my gene symbol.

Describe alternatives you've considered Using scanpy ;)

scottgigante commented 4 years ago

I can solve the first problem but the point about exact_word returning more than one hit I'm not so convinced about. It doesn't seem preposterous to me that you could expect to have multiple hits with any one word and not be upset about it (obviously not the case with 10X data but that is not the only data type we support). I lean more towards just leaving it as a dataframe if there are multiple hits -- you'll discover soon enough what's going on anyway.

scottgigante commented 4 years ago

@dburkhardt should the same behaviour apply to select_rows?

dburkhardt commented 4 years ago

Yes because if you pass in a list of series to pd.DataFrame then you get out a DF with 2 rows.