Open stevencox opened 6 years ago
@tubafrenzy, please assign a milestone date to this item and update status in the issue.
This will be finished by the end of next week.
@stevencox Do you have a sample join that I could use for testing and development purposes? Seems like the API will need to accept parameters that indicate which column on one data set to join to which column on another data set. For now I am playing around with a dummy metadata file keyed off of the bicluster "index_id" field.
All you need to design the feature is any column shared by two input files.
CTD_chem_gene_ixns.csv header:
# Fields:
# ChemicalName,ChemicalID,CasRN,GeneSymbol,GeneID,GeneForms,Organism,OrganismID,Interaction,InteractionActions,PubMedIDs
CTD_chemicals.csv header:
# Fields:
# ChemicalName,ChemicalID,CasRN,Definition,ParentIDs,TreeNumbers,ParentTreeNumbers,Synonyms,DrugBankIDs
The generated service should allow a query by ChemicalID to return data joining CTD_chemicals and CTD_chem_gene_ixns data. Assume column names are the same.
Noticed that CTD_chem_gene_ixns.csv contains data of the form:
MESH:C533344
while CTD_chemicals.csv seems to have the prefix stripped off:
C025205
This discrepancy isn't completely germane to the development I am doing, but it would mean these tables don't join properly in a demo/example.
Also, as I've been going down this road, I assume the API shoule be able to represent both one-to-one and many-to-one relationships from the perspective of both table queries? Or should they be cleanly married into a single denormalized-type table result from the "many" perspective, with duplicated "one" rows per line?
(a), the normalized, relational approach, not the denormalized.
smartBag can generate a smartAPI from a BDBag.
But it's very simple and does not support API endpoints that require joining tabular data from multiple files.