aertslab / scenic-nf

DEPRECATED | pySCENIC pipeline implemented in Nextflow using containers
GNU General Public License v3.0
6 stars 4 forks source link

Metadata mismatch found in `from_delayed` #3

Closed grst closed 4 years ago

grst commented 4 years ago

Hi,

when I try to apply the pipeline on my dataset it fails with the following error. Any idea what could be the problem? Let me know if you need more information.

Best, Gregor

Dataset

Command

> nextflow run aertslab/scenic-nf -profile singularity \
    --expr expr_mat.loom --output pyscenic_output.loom --TFs allTFs_hg38.txt \
    --db "*feather"  --motifs "motifs-v9-nr.hgnc-m0.001-o0.0.tbl"  --threads=1

Output

N E X T F L O W  ~  version 19.10.0                                                                                                                                                                                  
Launching `aertslab/scenic-nf` [boring_woese] - revision: 7fa5951bfb [master]                                                                                                                                        

***                                                                                                                                                                                                                  
Parameters in use:                                                                                                                                                                                                   
expr=expr_mat.loom                                                                                                                                                                                                   
TFs=allTFs_hg38.txt                                                                                                                                                                                                  
motifs=motifs-v9-nr.hgnc-m0.001-o0.0.tbl                                                                                                                                                                             
db=*feather                                                                                                                                                                                                          
output=pyscenic_output.loom                                                                                                                                                                                          
outdir=.                                                                                                                                                                                                             
grn=grnboost2                                                                                                                                                                                                        
threads=1                                                                                                                                                                                                            
cell_id_attribute=CellID                                                                                                                                                                                             
gene_attribute=Gene                                                                                                                                                                                                  
nb_runs=1                                                                                                                                                                                                            
tag=0.9.9                                                                                                                                                                                                            
qsubaccount=                                                                                                                                                                                                         
***                                                                                                                                                                                                                  
Using 2 feather databases:                                                                                                                                                                                           
  /data/projects/[...]/scenic/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather                                                                                      
  /data/projects/[...]/scenic/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather                                                                               
***                                                                                                                                                                                                                  

executor >  local (1)                                                                                                                                                                                                
[3c/2f4186] process > GRNinference (1) [  0%] 0 of 1                                                                                                                                                                 
[-        ] process > cisTarget        -                                                                                                                                                                             
[-        ] process > AUCell           -                       
Error executing process > 'GRNinference (1)'                                                                                                                                                                         

Caused by:                                                                                                                                                                                                           
  Process `GRNinference (1)` terminated with an error exit status (1)                                                                                                                                                

Command executed:                                                                                                                                                                                                    

  pyscenic grn         --num_workers 1         -o "run_1__adj.tsv"         --method grnboost2         --cell_id_attribute CellID         --gene_attribute Gene         expr_mat.loom         allTFs_hg38.txt         

Command exit status:                                                                                                                                                                                                 
  1                                                                                                                                                                                                                  

Command output:                                                                                                                                                                                                      
  preparing dask client                                                                                                                                                                                              
  parsing input                                                                                                                                                                                                      
  creating dask graph                                                                                                                                                                                                
  1 partitions                                                                                                                                                                                                       
  computing dask graph                                                                                                                                                                                               
  not shutting down client, client was created externally                                                                                                                                                            
  finished                                                                                                                                                                                                           

Command error:                                       
         [0.        , 0.        , 0.30562347, ..., 0.        , 0.        ,                                
          0.        ]], dtype=float32), ['NOC2L', 'HES4', 'CPTP', 'VAMP3', 'ENO1', 'AGMAT', 'HP1BP3', 'ZBTB40', 'LUZP1', 'RUNX3', 'ZNF683', 'GMEB1', 'ZCCHC17', 'HDAC1', 'ZNF362', 'SFPQ', 'STK40', 'MTF1', 'POU3F1',
 'RLF', 'SMAP2', 'E                                  
  kwargs:    {}                                      
  Exception: ValueError('Metadata mismatch found in `from_delayed`.\n\nExpected partition of type `DataFrame` but got `NoneType`',)

  'infer_data failed for target MFSD1' Retry (1/10). Failure caused by ValueError("Regression for target gene MFSD1 failed. Cause ValueError('buffer source array is read-only',).",).
  'infer_data failed for target MFSD1' Retry (2/10). Failure caused by ValueError("Regression for target gene MFSD1 failed. Cause ValueError('buffer source array is read-only',).",).
  'infer_data failed for target MFSD1' Retry (3/10). Failure caused by ValueError("Regression for target gene MFSD1 failed. Cause ValueError('buffer source array is read-only',).",).
  /usr/local/lib/python3.6/site-packages/arboreto/algo.py:214: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
    expression_matrix = expression_data.as_matrix()  
  Traceback (most recent call last):                 
    File "/usr/local/bin/pyscenic", line 10, in <module>                                                  
      sys.exit(main())                               
    File "/usr/local/lib/python3.6/site-packages/pyscenic/cli/pyscenic.py", line 402, in main             
      args.func(args)                                
    File "/usr/local/lib/python3.6/site-packages/pyscenic/cli/pyscenic.py", line 67, in find_adjacencies_command
      network = method(expression_data=ex_mtx, tf_names=tf_names, verbose=True, client_or_address=client) 
    File "/usr/local/lib/python3.6/site-packages/arboreto/algo.py", line 41, in grnboost2                 
      early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose)         
    File "/usr/local/lib/python3.6/site-packages/arboreto/algo.py", line 135, in diy                      
      .compute(graph, sync=True) \                   
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 2758, in compute            
      result = self.gather(futures)                  
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 1822, in gather             
      asynchronous=asynchronous,                     
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 753, in sync                
      return sync(self.loop, func, *args, **kwargs)  
    File "/usr/local/lib/python3.6/site-packages/distributed/utils.py", line 331, in sync                 
      six.reraise(*error[0])                         
    File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise                            
      raise value                                    
    File "/usr/local/lib/python3.6/site-packages/distributed/utils.py", line 316, in f                    
      result[0] = yield future                       
    File "/usr/local/lib/python3.6/site-packages/tornado/gen.py", line 729, in run                        
      value = future.result()                        
    File "/usr/local/lib/python3.6/site-packages/tornado/gen.py", line 736, in run                        
      yielded = self.gen.throw(*exc_info)  # type: ignore                                                 
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 1653, in _gather            
      six.reraise(type(exception), exception, traceback)                                                  
    File "/usr/local/lib/python3.6/site-packages/six.py", line 692, in reraise                            
      raise value.with_traceback(tb)                 
    File "/usr/local/lib/python3.6/site-packages/dask/dataframe/utils.py", line 521, in check_meta        
      errmsg))                                       
  ValueError: Metadata mismatch found in `from_delayed`.                                                  

  Expected partition of type `DataFrame` but got `NoneType`                                               
  distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-1, started daemon)>
  distributed.nanny - WARNING - Worker process 15325 was killed by unknown signal                         
  /usr/local/lib/python3.6/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full
 details.                                            
    data = yaml.load(f.read()) or {}                 
cflerin commented 4 years ago

Hi Gregor,

First, my apologies, but this is a very outdated version of the pySCENIC Nextflow pipeline. I should have archived this repo actually, but have left it here as a reference.

But, your specific issue seems similar to other pySCENIC-related problems (https://github.com/aertslab/pySCENIC/issues/45), and I think it was tracked back to a bug in pandas 0.23.x, which we've since fixed in later releases of pySCENIC. This repo is pointing to aertslab/pySCENIC:0.0.9, which is still using the old pandas version.

So you could (hopefully) solve this by doing one of two things:

  1. You can continue to use this repo, and use a later pySCENIC tag (adding --tag 0.9.18 on the command line just tested ok).
  2. Move to the newer (py)SCENIC Nextflow repository (https://github.com/aertslab/SCENICprotocol), which has a more updated workflow that includes additional analysis steps that are run in Scanpy.

Hopefully this helps, sorry again for the confusion.

grst commented 4 years ago

Hi @cflerin,

thanks for the info. Will move to the new version then. Maybe you could just add a line to the README that there is a new version?

Cheers, Gregor