dpeerlab / Palantir

Single cell trajectory detection
https://palantir.readthedocs.io
GNU General Public License v2.0
212 stars 49 forks source link

Bug computing gene expression trends #38

Closed davisidarta closed 4 years ago

davisidarta commented 4 years ago

Hi! I'm trying to use Palantir to compute pseudotime and gene expression trends. Yet, I'm facing the following bug, which seems to be due to the deprecation of some syntax used in the code.


gene_trends = palantir.presults.compute_gene_trends( pr_res, df[genes])
Alveolar type 1
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
/usr/local/lib/python3.8/dist-packages/palantir/presults.py:169: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  pd.DataFrame(np.array([x, y]).T[use_inds, :], columns=["x", "y"])
/usr/local/lib/python3.8/dist-packages/palantir/presults.py:169: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  pd.DataFrame(np.array([x, y]).T[use_inds, :], columns=["x", "y"])
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
    r = call_item()
  File "/usr/local/lib/python3.8/dist-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/dist-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/joblib/parallel.py", line 252, in __call__
    return [func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/joblib/parallel.py", line 252, in <listcomp>
    return [func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/palantir/presults.py", line 169, in _gam_fit_predict
    pd.DataFrame(np.array([x, y]).T[use_inds, :], columns=["x", "y"])
ValueError: could not broadcast input array from shape (37449,2) into shape (37449)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/palantir/presults.py", line 134, in compute_gene_trends
    res = Parallel(n_jobs=n_jobs)(
  File "/usr/local/lib/python3.8/dist-packages/joblib/parallel.py", line 1042, in __call__
    self.retrieve()
  File "/usr/local/lib/python3.8/dist-packages/joblib/parallel.py", line 921, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/lib/python3.8/dist-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
ValueError: could not broadcast input array from shape (37449,2) into shape (37449)

I'm running python 3.8 on a Linux system and analyzing 37,449 cells. I want to get the gene expression trends for three different terminal states I identified.

Any insights on this from the developers?

EDIT: The bug happens only if passing more than one gene. Calling palantir.presults.compute_gene_trends(pr_res, df['GENE']) works fine.

ManuSetty commented 4 years ago

Can you please try again by setting n_jobs=1. We have seen some instances where parallel processing led to errors in computing the trends.

davisidarta commented 4 years ago

Hi @ManuSetty. Setting n_jobs=1 solved the issue. Thank you!