I have the following code that takes a list of contig 'objects' and converts them into a list of files.
The tool
The tool custom-create-regions-bed-from-contigs-list__1.0.0.cwl is a valid cwl tool and is shown below
Click to expand!
```yaml
cwlVersion: v1.1
class: CommandLineTool
# Extensions
$namespaces:
s: https://schema.org/
ilmn-tes: http://platform.illumina.com/rdf/ica/
$schemas:
- https://schema.org/version/latest/schemaorg-current-http.rdf
# Metadata
s:author:
class: s:Person
s:name: Alexis Lucattini
s:email: Alexis.Lucattini@umccr.org
s:identifier: https://orcid.org/0000-0001-9754-647X
# ID/Docs
id: custom-create-regions-bed-from-contigs-list--1.0.0
label: custom-create-regions-bed-from-contigs-list v(1.0.0)
doc: |
create a bed file from a list of contigs objects
hints:
ResourceRequirement:
ilmn-tes:resources:
tier: standard
type: standard
size: small
coresMin: 1
ramMin: 2000
DockerRequirement:
dockerPull: umccr/alpine-pandas:latest
requirements:
SchemaDefRequirement:
types:
- $import: contig__1.0.0.yaml
InlineJavascriptRequirement: {}
InitialWorkDirRequirement:
listing:
- entryname: get_regions_bed.py
entry: |
#!/usr/bin/env python3
"""
Import args
Collect args and confirm
Generate regions bed from args
"""
# Imports
import pandas as pd
import argparse
from itertools import chain
import sys
from pathlib import Path
# Globals
OUTPUT_COLUMNS = ["chromosome", "start", "end"]
# Inputs
def get_args():
"""
Get arguments for the command
"""
parser = argparse.ArgumentParser(description="Create regions bed from contigs object list")
# Arguments
parser.add_argument("--output-regions-bed", required=True,
help="Path to output bed file")
parser.add_argument("--contig", action="append", nargs='*', required=True,
help="Each of the contig objects")
return parser.parse_args()
# Check args
def set_args(args):
"""
Check arguments
"""
# Create directory for bed file
parent_dir = Path(getattr(args, "output_regions_bed", None)).parent
parent_dir.mkdir(parents=True, exist_ok=True)
# Initialise args dict with mandatory args
contigs_arg = getattr(args, "fastq_list_row", [])
contigs = []
for contig in contigs_arg:
contigs.append(json.loads(contig[0]))
setattr(args, "contigs_list", contigs)
return args
# Create DF from args dict
def create_regions_bed_from_contigs(contigs):
"""
Create a dataframe from the set args output
"""
# Create dataframe from args dict
regions_df = pd.DataFrame(contigs)
# Return data frame
return regions_df
def finalise_output_df(regions_df):
"""
Returns the regions bed with the right column order.
The should already be in this order but just to make sure
"""
regions_df = regions_df.reindex(columns=OUTPUT_COLUMNS)
return regions_df
def write_regions_obj_to_bed(regions_df, output_file):
"""
Write the regions_df to the specified output file
"""
regions_df.to_csv(output_file, sep="\t", header=False, index=False)
def main():
# Get args
args = get_args()
# Get args dict from args and check args
args = set_args(args)
# Create df from args dict
regions_df = create_regions_bed_from_args(args.contigs_list)
# Construct output dfs
regions_df = finalise_output_df(regions_df)
# Write out csv
write_regions_obj_to_bed(regions_df, args.output_regions_bed)
if __name__ == "__main__":
main()
baseCommand: [ "python", "get_regions_bed.py" ]
inputs:
contig_list:
label: List of contigs
doc: |
Each contig has the following attributes:
* chromosome
* start
* end
type:
- type: array
items: contig__1.0.0.yaml#contig
inputBinding:
prefix: "--contig="
separate: false
valueFrom: |
${
return JSON.stringify(self);
}
inputBinding:
# Makes sure all items are together
position: 1
regions_bed:
label: output file name for the regions bed file
doc: |
The output regions bed file name
type: string?
default: "regions.bed"
inputBinding:
prefix: "--output-regions-bed"
outputs:
regions_bed_out:
label: regions bed out
doc: |
This is the output of the regions bed file
type: File
outputBinding:
glob: "$(inputs.regions_bed)"
successCodes:
- 0
```
The schema
The schema file contig__1.0.0.yaml has the following contents
Click to expand!
```yaml
type: record
name: contig
fields:
chromosome:
label: chromosome
doc: |
The name of the chromosome
type: string
start:
label: start position
doc: |
The start position of the chromosome of the region
type: int?
end:
label: end position
doc: |
The end position of the chromosome of the region
type: int?
```
The code
I use the following code to import the cwl file through the cwl-utils parser object
Click to expand!
```python
# Imports
from cwl_utils import parser_v1_1 as parser
from pathlib import Path
from ruamel import yaml
# Vars
cwl_tool_file_path=Path("custom-create-regions-bed-from-contigs-list__1.0.0.cwl")
cwl_schema_file_path=Path("contig.yaml")
# Read in the cwl file from a yaml
with open(cwl_tool_file_path, "r") as cwl_h:
cwl_tool_yaml_obj = yaml.main.round_trip_load(cwl_h, preserve_quotes=True)
# Load the document and get the following error
parser.load_document_by_yaml(cwl_tool_yaml_obj, cwl_tool_file_path.absolute().as_uri())
```
Traceback error
Click to expand!
```
Traceback (most recent call last):
File "", line 1, in
File "/home/alexiswl/anaconda3/envs/cwl-ica/lib/python3.8/site-packages/cwl_utils/parser_v1_1.py", line 12164, in load_document_by_yaml
return _document_load(union_of_CommandLineToolLoader_or_ExpressionToolLoader_or_WorkflowLoader_or_array_of_union_of_CommandLineToolLoader_or_ExpressionToolLoader_or_WorkflowLoader, yaml, uri, loadingOptions)
File "/home/alexiswl/anaconda3/envs/cwl-ica/lib/python3.8/site-packages/cwl_utils/parser_v1_1.py", line 557, in _document_load
return loader.load(doc, baseuri, loadingOptions, docRoot=baseuri)
File "/home/alexiswl/anaconda3/envs/cwl-ica/lib/python3.8/site-packages/cwl_utils/parser_v1_1.py", line 394, in load
raise ValidationException("", None, errors, "-")
schema_salad.exceptions.ValidationException: - tried _RecordLoader but
Trying 'CommandLineTool'
the `inputs` field is not valid because:
- tried _ArrayLoader but
Expected a list
- tried _RecordLoader but
Trying 'CommandInputParameter'
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:162:5: the `type` field is not
valid because:
- tried _EnumLoader but
Expected one of
('File', 'Directory')
- tried _EnumLoader but
Expected one of
('stdin',)
- tried _RecordLoader but
Expected a dict
- tried _RecordLoader but
Expected a dict
- tried _RecordLoader but
Expected a dict
- tried _PrimitiveLoader
but
Expected a tuple but
got list
- tried _ArrayLoader but
- tried _ArrayLoader but
Expected a list
- tried _UnionLoader but
- tried _EnumLoader
but
Expected one of
('File', 'Directory')
- tried _RecordLoader
but
Trying
'CommandInputRecordSchema'
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:163:9: the `type` field
is not valid because:
Expected one of
('record',)
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:164:9: invalid field
`items`, expected one of: `fields`,
`type`, `label`, `doc`, `name`,
`inputBinding`
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:162:5: - tried _RecordLoader
but
Trying
'CommandInputEnumSchema'
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:163:9: the `symbols`
field is not valid because:
Expected a list
the `type` field
is not valid because:
Expected one of
('enum',)
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:164:9: invalid field
`items`, expected one of: `symbols`,
`type`, `label`, `doc`, `name`,
`inputBinding`
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:162:5: - tried _RecordLoader
but
Trying
'CommandInputArraySchema'
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:164:9: the `items` field
is not valid because:
Term
'contig__1.0.0.yaml#contig' not in
vocabulary
custom-create-regions-bed-from-contigs-list__1.0.0.cwl:162:5: - tried
_PrimitiveLoader but
Expected a tuple
but got CommentedMap
- tried _RecordLoader but
Not a ExpressionTool
- tried _RecordLoader but
Not a Workflow
- tried _ArrayLoader but
Expected a list
```
The url variable is set to contig__1.0.0.yaml#contig.
I tinkered around by trying to import the loading options from the schema object into the vocab / rvocab objects to circumvent the urlsplit function on line 217 by simply returning the url on line 172. However it seems that loading options are not propagated through inputs anyway and when CommandArrayInputSchema is initialised with no loading options it inherits the defaults , without also looking at the SchemaDefRequirement imports.
A quick workaround
The LoadingOptions initialiser does however check the namespaces however, so simply adding in the following line into the tool:
resolves the issue and I can import the cwl object through the parser.
Update
This doesn't work when the tool is used in a workflow where another tool has this schema as an output and is the input to this tool.
Instead, I've resolved the workaround by manually adding in the namespace into the yaml object before loading through the parser.
# Imports
from cwl_utils import parser_v1_1 as parser
from pathlib import Path
from ruamel import yaml
# Vars
cwl_tool_file_path=Path("custom-create-regions-bed-from-contigs-list__1.0.0.cwl")
cwl_schema_file_path=Path("contig.yaml")
# Read in the cwl file from a yaml
with open(cwl_tool_file_path, "r") as cwl_h:
cwl_tool_yaml_obj = yaml.main.round_trip_load(cwl_h, preserve_quotes=True)
# Check for schemas
if cwl_tool_yaml_obj .get("requirements", None) is None:
pass
elif cwl_tool_yaml_obj .get("requirements").get("SchemaDefRequirement", None) is None:
pass
elif cwl_tool_yaml_obj .get("requirements").get("SchemaDefRequirement").get("types", None) is None:
pass
else:
for imports in cwl_tool_yaml_obj .get("requirements").get("SchemaDefRequirement").get("types"):
# We need the relative import path and the schema path
schema_relative_imports_path = imports.get("$import")
schema_import_path = (Path(self.cwl_file_path).parent / Path(schema_relative_imports_path)).resolve()
# Open the schema as a RecordSchema object
with open(schema_import_path, "r") as cwl_h:
cwl_schema_yaml_obj = yaml.main.round_trip_load(cwl_h, preserve_quotes=True)
# Read schema as a record schema object and get the name
schema_name = RecordSchema(cwl_schema_yaml_obj ).type.get("name")
# Get schema string like 'contig__1.0.0#contig'
schema_namespace_str = "#".join(map(str, [schema_relative_imports_path, schema_name]))
# Add to namespace
if yaml_obj.get('$namespaces') is None:
yaml_obj['$namespaces'] = OrderedDict({
schema_namespace_str: schema_namespace_str
})
else:
yaml_obj['$namespaces'][schema_namespace_str] = schema_namespace_str
# Load the document and get the following error
parser.load_document_by_yaml(cwl_tool_yaml_obj, cwl_tool_file_path.absolute().as_uri())
I have the following code that takes a list of contig 'objects' and converts them into a list of files.
The tool
The tool
custom-create-regions-bed-from-contigs-list__1.0.0.cwl
is a valid cwl tool and is shown belowClick to expand!
```yaml cwlVersion: v1.1 class: CommandLineTool # Extensions $namespaces: s: https://schema.org/ ilmn-tes: http://platform.illumina.com/rdf/ica/ $schemas: - https://schema.org/version/latest/schemaorg-current-http.rdf # Metadata s:author: class: s:Person s:name: Alexis Lucattini s:email: Alexis.Lucattini@umccr.org s:identifier: https://orcid.org/0000-0001-9754-647X # ID/Docs id: custom-create-regions-bed-from-contigs-list--1.0.0 label: custom-create-regions-bed-from-contigs-list v(1.0.0) doc: | create a bed file from a list of contigs objects hints: ResourceRequirement: ilmn-tes:resources: tier: standard type: standard size: small coresMin: 1 ramMin: 2000 DockerRequirement: dockerPull: umccr/alpine-pandas:latest requirements: SchemaDefRequirement: types: - $import: contig__1.0.0.yaml InlineJavascriptRequirement: {} InitialWorkDirRequirement: listing: - entryname: get_regions_bed.py entry: | #!/usr/bin/env python3 """ Import args Collect args and confirm Generate regions bed from args """ # Imports import pandas as pd import argparse from itertools import chain import sys from pathlib import Path # Globals OUTPUT_COLUMNS = ["chromosome", "start", "end"] # Inputs def get_args(): """ Get arguments for the command """ parser = argparse.ArgumentParser(description="Create regions bed from contigs object list") # Arguments parser.add_argument("--output-regions-bed", required=True, help="Path to output bed file") parser.add_argument("--contig", action="append", nargs='*', required=True, help="Each of the contig objects") return parser.parse_args() # Check args def set_args(args): """ Check arguments """ # Create directory for bed file parent_dir = Path(getattr(args, "output_regions_bed", None)).parent parent_dir.mkdir(parents=True, exist_ok=True) # Initialise args dict with mandatory args contigs_arg = getattr(args, "fastq_list_row", []) contigs = [] for contig in contigs_arg: contigs.append(json.loads(contig[0])) setattr(args, "contigs_list", contigs) return args # Create DF from args dict def create_regions_bed_from_contigs(contigs): """ Create a dataframe from the set args output """ # Create dataframe from args dict regions_df = pd.DataFrame(contigs) # Return data frame return regions_df def finalise_output_df(regions_df): """ Returns the regions bed with the right column order. The should already be in this order but just to make sure """ regions_df = regions_df.reindex(columns=OUTPUT_COLUMNS) return regions_df def write_regions_obj_to_bed(regions_df, output_file): """ Write the regions_df to the specified output file """ regions_df.to_csv(output_file, sep="\t", header=False, index=False) def main(): # Get args args = get_args() # Get args dict from args and check args args = set_args(args) # Create df from args dict regions_df = create_regions_bed_from_args(args.contigs_list) # Construct output dfs regions_df = finalise_output_df(regions_df) # Write out csv write_regions_obj_to_bed(regions_df, args.output_regions_bed) if __name__ == "__main__": main() baseCommand: [ "python", "get_regions_bed.py" ] inputs: contig_list: label: List of contigs doc: | Each contig has the following attributes: * chromosome * start * end type: - type: array items: contig__1.0.0.yaml#contig inputBinding: prefix: "--contig=" separate: false valueFrom: | ${ return JSON.stringify(self); } inputBinding: # Makes sure all items are together position: 1 regions_bed: label: output file name for the regions bed file doc: | The output regions bed file name type: string? default: "regions.bed" inputBinding: prefix: "--output-regions-bed" outputs: regions_bed_out: label: regions bed out doc: | This is the output of the regions bed file type: File outputBinding: glob: "$(inputs.regions_bed)" successCodes: - 0 ```The schema
The schema file
contig__1.0.0.yaml
has the following contentsClick to expand!
```yaml type: record name: contig fields: chromosome: label: chromosome doc: | The name of the chromosome type: string start: label: start position doc: | The start position of the chromosome of the region type: int? end: label: end position doc: | The end position of the chromosome of the region type: int? ```The code
I use the following code to import the cwl file through the cwl-utils parser object
Click to expand!
```python # Imports from cwl_utils import parser_v1_1 as parser from pathlib import Path from ruamel import yaml # Vars cwl_tool_file_path=Path("custom-create-regions-bed-from-contigs-list__1.0.0.cwl") cwl_schema_file_path=Path("contig.yaml") # Read in the cwl file from a yaml with open(cwl_tool_file_path, "r") as cwl_h: cwl_tool_yaml_obj = yaml.main.round_trip_load(cwl_h, preserve_quotes=True) # Load the document and get the following error parser.load_document_by_yaml(cwl_tool_yaml_obj, cwl_tool_file_path.absolute().as_uri()) ```Traceback error
Click to expand!
``` Traceback (most recent call last): File "My digging around
The exception is raised on line 222 in the expand_url function
The
url
variable is set tocontig__1.0.0.yaml#contig
.I tinkered around by trying to import the loading options from the schema object into the vocab / rvocab objects to circumvent the urlsplit function on line 217 by simply returning the url on line 172. However it seems that loading options are not propagated through inputs anyway and when
CommandArrayInputSchema
is initialised with no loading options it inherits the defaults , without also looking at theSchemaDefRequirement
imports.A quick workaround
The LoadingOptions initialiser does however check the namespaces however, so simply adding in the following line into the tool:
resolves the issue and I can import the cwl object through the parser.
Update
This doesn't work when the tool is used in a workflow where another tool has this schema as an output and is the input to this tool.
Instead, I've resolved the workaround by manually adding in the namespace into the yaml object before loading through the parser.