ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
241 stars 123 forks source link

Autouri module #152

Closed Fnyasimi closed 4 years ago

Fnyasimi commented 4 years ago

I have installed the pipeline conda env on a new machine successfullly.

However I have run to this problem during the caper init local step

Traceback (most recent call last):
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/bin/caper", line 11, in <module>
    from caper.caper import main
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 31, in <module>
    from autouri import AutoURI, AbsPath, GCSURI, S3URI, URIBase
ModuleNotFoundError: No module named 'autouri'

I figured out the module is missing I downloaded it using conda install autouri

Ran again caper init local and it couldn't import logger, see error msg below;

Traceback (most recent call last):
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/bin/caper", line 11, in <module>
    from caper.caper import main
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/caper/caper.py", line 32, in <module>
    from autouri import logger as autouri_logger
ImportError: cannot import name 'logger' from 'autouri' (/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/autouri/__init__.py)

Kindly help out with the Autouri module

leepc12 commented 4 years ago

conda install caper -c bioconda is currently not working since bioconda is holding PR to update it.

Use pip install caper --upgrade instead.

Fnyasimi commented 4 years ago

Thank you this works well A slight error on croo execution as below;

Traceback (most recent call last):
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/bin/croo", line 4, in <module>
    import croo
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/croo/__init__.py", line 1, in <module>
    import croo.croo_args
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/croo/croo_args.py", line 11, in <module>
    from caper.caper_uri import (
ModuleNotFoundError: No module named 'caper.caper_uri'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/bin/croo", line 9, in <module>
    import croo
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/croo/__init__.py", line 1, in <module>
    import croo.croo_args
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/croo/croo_args.py", line 11, in <module>
    from caper.caper_uri import (
ModuleNotFoundError: No module named 'caper.caper_uri'
leepc12 commented 4 years ago

You can do pip install croo --upgrade for now. I will update croo on bioconda repo so that you can conda install it too.

Fnyasimi commented 4 years ago

Thank you tried to update qc2tsv the same way it didn't work though.

leepc12 commented 4 years ago

@Fnyasimi: Did you try with pip install qc2tsv --upgrade?

Fnyasimi commented 4 years ago

Yes I tried with that... the error message is;

Traceback (most recent call last):
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/bin/qc2tsv", line 13, in <module>                                                                               
    main()
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/qc2tsv/cli.py", line 103, in main                                                   
    delim=delim)
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/qc2tsv/qc2tsv.py", line 30, in __init__                                             
    s = AutoURI(qc).read()
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/autouri/autouri.py", line 599, in read                                              
    self.__raise_value_error()
  File "/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.7/site-packages/autouri/autouri.py", line 614, in __raise_value_error                               
    raise ValueError('Not a valid URI?. {f}'.format(f=self._uri))
ValueError: Not a valid URI?. qc/qc.json
leepc12 commented 4 years ago

Please use absolute path until I fix this. For example, qc2tsv $PWD/qc/qc.json.

leepc12 commented 4 years ago

Fixed in qc2tsv 0.1.4.

Fnyasimi commented 4 years ago

Thanks for the quick fix I am not sure if the message below will be a concern to you in the future

/home/fnyasimi/.conda/envs/encode-chip-seq-pipeline/lib/python3.6/site-packages/qc2tsv
/qc2tsv.py:78: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.
json_normalize instead                                                                
  df = pandas.io.json.json_normalize(jsons, sep=sep)

A quick question I am interested in running multiple json files in parallel using the pipeline. Is there a specific parallelization program which works best with the pipeline? I was thinking of using gnu parallel or snakemake but I am open to discussion/ideas.

I would appreciate if you can point me to a tool or a way of running the pipeline in parallel on a multicore hpc for multiple json files and what parameters to tune for efficiency and full utilization of the available cores.

leepc12 commented 4 years ago

Thanks for that deprecation warning. I will upgrade pandas to >= 1.0 in the next release.

Fnyasimi commented 4 years ago

I have this warning which keeps popping when I run the pipeline

2020-05-04 09:30:34,068 cromwell-system-akka.dispatchers.backend-dispatcher-257 WARN  - BackgroundConfigAsyncJobExecutionActor [UUID(dfb2a5ee)chip.xcor:0:1]: Unrecognized runtime attribute keys: disks, cpu, time, memory

Does that mean the resources I changed didn't change or what does it mean? The resources I have changed in the input json file are as below; "chip.call_peak_cpu" : 4 "chip.align_cpu" : 8 "chip.align_mem_mb" : 40000 "chip.call_peak_mem_mb" : 32000

leepc12 commented 4 years ago

You can safely ignore all warnings (WARN).

Some resource parameters are not used for some backends (platforms). For example, chip.*_disks is not used for SLURM but used for Google Cloud Platform. So if Cromwell always show this warning on SLURM.

Fnyasimi commented 4 years ago

Thanks for the info