ENCODE-DCC / croo

Cromwell output organizer
MIT License
13 stars 3 forks source link

NoneType object has no attribute get #13

Closed annashcherbina closed 4 years ago

annashcherbina commented 4 years ago

Error:

[CaperURI] copying from gcs to local, src: gs://caper_out/bpnet_anna/atac/d47e2acb-cd7b-4ce4-b342-b9bbbe3e54fe/call-qc_report/glob-3440f922973abb7a616aaf203e0db08b/qc.json
[CaperURI] copying skipped, target: /data/croo/qc/qc.json
Traceback (most recent call last):
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/bin/croo", line 13, in <module>
    main()
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/croo/croo.py", line 219, in main
    co.organize_output()
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/croo/croo.py", line 84, in organize_output
    path = out_var.get('path')
AttributeError: 'NoneType' object has no attribute 'get'

Version:

 pip show croo 
Name: croo
Version: 0.1.8
Summary: CRomwell Output Organizer
Home-page: https://github.com/ENCODE-DCC/croo
Author: Jin Lee
Author-email: leepc12@gmail.com
License: UNKNOWN
Location: /opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages
Requires: caper
Required-by: 

Command:

croo --out-dir /data/croo --out-def atac.croo.json gs://caper_out/bpnet_anna/atac/d47e2acb-cd7b-4ce4-b342-b9bbbe3e54fe/metadata.json
annashcherbina commented 4 years ago

Upgrading to 0.3.3 does not alter the error.

(encode-atac-seq-pipeline) annashch@caper:~$ pip show croo 
Name: croo
Version: 0.3.3
Summary: CRomwell Output Organizer
Home-page: https://github.com/ENCODE-DCC/croo
Author: Jin Lee
Author-email: leepc12@gmail.com
License: UNKNOWN
Location: /opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages
Requires: graphviz, caper
Required-by: 
(encode-atac-seq-pipeline) annashch@caper:~$ croo --out-dir /data/croo --out-def atac.croo.json gs://caper_out/bpnet_anna/atac/d47e2acb-cd7b-4ce4-b342-b9bbbe3e54fe/metadata.json
[CaperURI] copying from gcs to local, src: gs://caper_out/bpnet_anna/atac/d47e2acb-cd7b-4ce4-b342-b9bbbe3e54fe/metadata.json
[CaperURI] copying skipped, target: /data/croo/.croo_tmp/caper_out/bpnet_anna/atac/d47e2acb-cd7b-4ce4-b342-b9bbbe3e54fe/metadata.json
Traceback (most recent call last):
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/bin/croo", line 13, in <module>
    main()
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/croo/croo.py", line 304, in main
    no_graph=args['no_graph'])
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/croo/croo.py", line 53, in __init__
    self._cm = CromwellMetadata(self._metadata)
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/croo/cromwell_metadata.py", line 100, in __init__
    self.__parse_calls(self._metadata_json['calls'])
  File "/opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/croo/cromwell_metadata.py", line 183, in __parse_calls
    for output_name, output_path, _ in out_files:
TypeError: 'NoneType' object is not iterable
leepc12 commented 4 years ago

You need to use the latest output def JSON file --out-def-json. Use atac.croo.v2.json in ATAC-seq pipeline's git directory.

ychsiao1 commented 4 years ago

I'm getting the same problem with the CHiP-seq pipeline. Is this also because of the chip.croo.json file?

Attached is my metadata json file. Could it be a problem with this file?

metadata.json.zip

leepc12 commented 4 years ago

@ychsiao1 : Yes, there is. https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v2.json

ATAC v2: https://storage.googleapis.com/encode-pipeline-output-definition/atac.croo.v2.json

ychsiao1 commented 4 years ago

@leepc12 I'm using the v2 json file, and the error does not change.

leepc12 commented 4 years ago

@ychsiao1: Did you try with the latest Croo? Check its version croo -v.

ychsiao1 commented 4 years ago

@leepc12 My croo version is 0.3.3

leepc12 commented 4 years ago

Pipeline's Conda environment has its own Croo. You may have two Croos (one in Conda env and the other pip-installed).

Activate pipeline's Conda env and check Croo's version in it.

ychsiao1 commented 4 years ago

After activating conda env (conda activate encode-chip-seq-pipeline), croo -v is still 0.3.3

leepc12 commented 4 years ago

@annashcherbina Can you edit line 183~193 of /opt/anaconda3/envs/encode-atac-seq-pipeline/lib/python3.7/site-packages/croo/cromwell_metadata.py like the following (just adding an if statement if out_files:)

                if out_files:
                    for output_name, output_path, _ in out_files:
                        # add each output file to DAG
                        n = CMNode(
                            type='output',
                            shard_idx=shard_idx,
                            task_name=task_name,
                            output_name=output_name,
                            output_path=output_path,
                            all_outputs=None,
                            all_inputs=None)
                        self._dag.add_node(n)
annashcherbina commented 4 years ago

@leepc12 -- this edit gets rid of the error for both versions for both v1 and v2 json file. (just switching to v2 of the json didn't work for me either, so this edit was needed). Croo completed without errors with the addition of the if-statement to check if out_files was defined.

Can you clarify why "out_files" would be None in some cases and not others? I ran 6 samples in an identical fashion, and only this one had this issue (all pipeline/croo/input json versions same for the 6 samples). Does this indicate an issue with the pipeline outputs?

Thanks for your help!

leepc12 commented 4 years ago

That can happen when there is a task without an actual File output. There is no such task for ENCODE pipelines but if the metadata.son was not updated (this is a known issue of Caper, it sometimes fail to update metadata.son on the output directory cromwell-executions/) so some tasks (and the whole workflow) are still marked as Running and there are of course no File outputs for such task.

I will make a new release today.

ychsiao1 commented 4 years ago

The pipeline runs successfully now. Thanks!

As a side note, are the two example jsons written to skip peak calling? I don't see peak calling outputs when I run the examples. If I wanted to do peak calling, I would have to add that task into the json files?

leepc12 commented 4 years ago

@ychsiao1 What is your pipeline version?

ychsiao1 commented 4 years ago

@leepc12 I git cloned to encode pipeline yesterday, so it should be the newest version (encode-chip-seq-pipeline2). As for caper and croo, I'm using versions 0.6.3 and 0.3.4 respectively.

leepc12 commented 4 years ago

@ychsiao1 : can you upload your metadata.json and Croo's HTML report?

ychsiao1 commented 4 years ago

@leepc12 Here are the two files

metatadata and ctoo html.zip

leepc12 commented 4 years ago

@ychsiao1 : This metadata.json is from a failed workflow. Workflow's status is marked as "Failed". I think this failed before peak-calling steps. Try with metadata.json from a succeeded workflow.

ychsiao1 commented 4 years ago

@leepc12 Is there a particular reason why the workflow failed when creating that metadata.json file? I'm using the example inputs provided within the cloned directory without making any changes. (ENCSR936XTK_subsampled_chr19_only.json)

leepc12 commented 4 years ago

metadata.json is like an output log of a workflow. It seems like your test run with ENCSR936XTK_subsampled_chr19_only.json failed somehow.

Please post an issue on pipeline's github repo then I will take a look. Follow the bug reporting instruction there.