OpenBEL / resource-generator

Python modules to generate BEL resource documents.
Apache License 2.0
0 stars 4 forks source link

Make it easy to add custom annotations or namespaces. #67

Open abargnesi opened 8 years ago

abargnesi commented 8 years ago

While discussing the new workflow with OpenBEL Platform Juliane mentioned the importance of incorporating custom annotations and namespaces into the RDF resources dataset.

@ncatlett put together a tab-delimited format that would be perfect for this. See Adding new namespace datasets.

This work would be to make it easier to add custom datasets to a run of the resource-generator. Hopefully easier than editing python.

ncatlett commented 8 years ago

Tony - as is, only two lines should need to be added to configuration.py in order to add a namespace to the RDF resources. The other steps are all related to supporting the old .belns/.beleq file generation.

The pipeline may need some minor modifications to make sure that Annotations will also work - in datasets.py, the StandardCustomData class needs to be modified to include Annotation datasets, and the encoding types need to be handled (ideally namespace encoding types will get fully merged with annotation concept types).

abargnesi commented 8 years ago

Thanks for expanding on this. Wouldn't annotations datasets already be supported since they reuse the NamespaceDataSet class? Or is it that we have to add more functions to StandardCustomData to provide annotation data?

ncatlett commented 8 years ago

The NamespaceDataSet class can have a scheme_type of "ns" and/or "anno".  The default for StandardCustomData should be "ns" and StandardCustomData does not have a working method to generate annotation concept types. The scheme_type is used in generation of the rdf for two purposes: (1) to indicate if a data set is a NamespaceConceptScheme or an AnnotationConceptScheme and (2) to direct if AnnotationConceptTypes and/or encoding concepts should be generated for each value.