Closed burakalver closed 7 years ago
This is done with a pull request to merge into master and an issue for Jeremy to write tests.
This is done (again after modification) with a pull request to merge into master and an issue for Jeremy to write tests.
I went through the schemas for Capture HiC and have the following suggestions, that's why reopened the issue.
I made a new branch from add capture hic, called add capture hic koray and pushed my commits there.
1) class inheritance.
if we can use the class inheritance we do not need to add properties as mixins and dependencies. We can also carry common mixin imports to experiment and simplify specific experiment schemas. However if I get rid of the
"$ref": "experiment.json#/properties"
we get the following error. If you agree that this would be useful I will create an insert for @j1z0.
Traceback (most recent call last): File "bin/dev-servers", line 74, in <module> sys.exit(snovault.dev_servers.main()) File "/Users/koray/Github/encode/develop/snovault/src/snovault/dev_servers.py", line 100, in main create_mapping.run(app) File "/Users/koray/Github/encode/develop/snovault/src/snovault/elasticsearch/create_mapping.py", line 483, in run mapping = type_mapping(registry[TYPES], collection.type_info.item_type) File "/Users/koray/Github/encode/develop/snovault/src/snovault/elasticsearch/create_mapping.py", line 390, in type_mapping mapping = schema_mapping(item_type, schema) File "/Users/koray/Github/encode/develop/snovault/src/snovault/elasticsearch/create_mapping.py", line 71, in schema_mapping mapping = schema_mapping(k, v) File "/Users/koray/Github/encode/develop/snovault/src/snovault/elasticsearch/create_mapping.py", line 62, in schema_mapping type_ = schema['type'] KeyError: 'type'
2) In mixin library adjusted the fragmentation date to allow date only, and changed the description to alert the submitter
3) there is a linked file in targeted_regions.oligo_file I changed if from File to Document which is a much simpler way to add a single file.
4) changed loadxl.py to skip the fields that can create cyclic behavior.
Also in this branch the targeted_region is an embedded object. I think we can create an item that is more flexible in terms of defining a targeted region (both for proteins/antibodies and rna/dna) and replace targeted_region with that.
On 3, Andy and I discussed and thought file may be better because these files could get big (there might be many captured regions, in particular for 5C) and they can be shared across experiments. It would be nice to track which experiments share the same region. I got the impression document is not something we have a schema for.
I don't understand the details well enough to comment on 1,2 or 4 but sounds reasonable.
I made a new branch from add capture hic, called add capture hic koray and pushed my commits there.
Once we have discussed and decided things I will modify my branch to incorporate these suggestions.
1) class inheritance. if we can use the class inheritance we do not need to add properties as mixins and dependencies. We can also carry common mixin imports to experiment and simplify specific experiment schemas. However if I get rid of the "$ref": "experiment.json#/properties" we get the following error. If you agree that this would be useful I will create an insert for @j1z0.
I indeed think this would be useful. I had tried to make ExperimentCaptureC inherit directly from ExperimentHiC (rather than Experiment) with the expectation that indeed we might not need to add the $ref for properties. I tried adding in base_types in various combinations and that also did not work. Not sure if it is better to have ExperimentCaptureC inherit from ExperimentHiC or Experiment though. I think I got an error or some issue when I tried the former but not sure as I was making lots of changes at that point. In any case I observed the same error as you when the $ref were left out.
For the time being though it works so I don't think that this one point should hold back a merge with master - the same dependency on the $ref for experiment properties holds true for experiment_hic.json so it is a general issue that needs a git issue.
2) In mixin library adjusted the fragmentation date to allow date only, and changed the description to alert the submitter
That is fine and I made this change in add_capture_c branch. Remind me again (on slack if you prefer) why the AnyOf causes problems or having more than one possible format? Is it in the get_fields or data_upload part of the process? Just curious.
3) there is a linked file in targeted_regions.oligo_file I changed if from File to Document which is a much simpler way to add a single file.
I think this needs to be a File Item and not a document pdf. We expect to receive a multi fasta file with these oligos and it may be used in subsequent analysis pipelines so needs to be a real 'File'.
4) changed loadxl.py to skip the fields that can create cyclic behavior.
Thanks. Changed in add_capture_c branch.
Also in this branch the targeted_region is an embedded object. I think we can create an item that is more flexible in terms of defining a targeted region (both for proteins/antibodies and rna/dna) and replace targeted_region with that.
You still have the old version of the schema regarding the 'targeted_region' property. I made a target.json and while the 'targeted_region' is still an embedded object it is now simpler and uses the target schema - if you have suggestions for making target.json more flexible by all means please propose them. (I also use the target schema in the modification.json schema).
I'm about to commit some changes to the add_capture_hic branch now.
Let's look at Erez's loop extrusion paper to see how different capture-Hi-C is from Hi-C to decide how we want to add it to the existing schemas.