arkhn / fhir-river

Live ETL pipeline to standardize Health Data into FHIR.
Apache License 2.0
42 stars 4 forks source link

Naming harmonization between pyrog and river #634

Open simonvadee opened 3 years ago

simonvadee commented 3 years ago

This is meant to be an open discussion before actually starting to rename stuff across the back and the front.

Problem

We don't use the same terminology to reference the same concepts across the web application and the back-end. I think we can use this discussion to discuss naming in general (where it can be controversial) and use this occasion to harmonize the terminology we use.

Description

Credentials vs Database: we currently use Credential to refer to database connection informations (host,port,login,password,database name) but it can be confusing ("credentials of what ? a user ? to what ?").

Owner vs Schema: a database may contain many "schemas" . I think back in the days @Jasopaum and I were confused between the difference between both terms (and I think that the same word has a different meaning in postgres, mssql and oracle). We currently use Owner but I think DatabaseSchema or just Schema would be more accurate.

Source vs something else ?: a Source has a Crendential (ie: it is linked to a database) and has many Resource. It is meant to represent a "source of information" from which we want to be able to extract data in order to create FHIR resources. For now, it can only be an SQL database, and maybe it's fine until the pyrog scope remains unclear. For instance, when the data source is a "flux" (eg: a SFTP server with csv files), do we want pyrog/river to be aware of this ? This comes back to the datalake question and I'm not sure we want to address this here. However, don't hesitate if you have suggestions!

Resource vs Mapping or Mappings: This is the term for which we have the most ambiguity right now. It is called Resource in the back and Mapping in the webapp (lol what a great idea we had). I think we all agree that Resource is too vague and refers to too many concepts (even in software engineering in general). Mapping is a better word for this concept but should we use plural or singular ?

Column: is meant to represent a database column, but it also has table and owner fields. I think this one is fine (until we normalize the schema and use a single Column object for a column of the database) but I mention it anyway.

Implementation

First, let's agree on the naming. Then, we can do one PR for a single concept renaming (it means a new database migration in the back and updating the front and back code) at a time.

BPierrick commented 3 years ago

Mapping is a better word for this concept but should we use plural or singular ?

I don't see any problem about employing singular, as it may happen that we manipulate several of these objects at once.

I also have a suggestion to make, about the Attribute.path attribute, which has the same name as ElementDefinition.path but is not the same as a Fhir Path concept at all. Especially in the FhirResourceTree, this may lead to confusions.

MiskoG commented 3 years ago

Great initiative @simonvadee 👍

elsiehoffet-94 commented 3 years ago

Yes, a source should definitely be renamed, and project suits better and is more flexible (many projects for one database, different kinds of data origin..). Regarding the mapping it seems fine by me, but I can anticipate some confusions : what about code mappings (between terminologies, aka conceptmaps), and how do we call the DBT rules ? @nriss any idea about the latter?

nriss commented 3 years ago

Credentials vs Database: we currently use Credential to refer to database connection informations (host,port,login,password,database name) but it can be confusing ("credentials of what ? a user ? to what ?").

I suggest to use the same word as airbyte: connection Just an idea to think about: what if these connections are set in another part of pyrog and then when we want to create a source, we can choose a predefined connection.

Owner vs Schema: a database may contain many "schemas" . I think back in the days @Jasopaum and I were confused between the difference between both terms (and I think that the same word has a different meaning in postgres, mssql and oracle). We currently use Owner but I think DatabaseSchema or just Schema would be more accurate.

In airbyte, the form is updated depending on the choice of db

Source vs something else ?: a Source has a Crendential (ie: it is linked to a database) and has many Resource. It is meant to represent a "source of information" from which we want to be able to extract data in order to create FHIR resources. For now, it can only be an SQL database, and maybe it's fine until the pyrog scope remains unclear. For instance, when the data source is a "flux" (eg: a SFTP server with csv files), do we want pyrog/river to be aware of this ? This comes back to the datalake question and I'm not sure we want to address this here. However, don't hesitate if you have suggestions!

I agree with you @elsiehoffet-94 and @MiskoG, project seems great, i don't see any better word for now

Resource vs Mapping or Mappings: This is the term for which we have the most ambiguity right now. It is called Resource in the back and Mapping in the webapp (lol what a great idea we had). I think we all agree that Resource is too vague and refers to too many concepts (even in software engineering in general). Mapping is a better word for this concept but should we use plural or singular ?

Mapping is ok for me. Why are you hesitating between singular or plural ? It depends on the situation, no ? I don't have any idea about what is the best What are you calling DBT rules @elsiehoffet-94 ? It is the sql request that generate the dbt views ? According to me, there is no need to name that because it is seen as a classical table on pyrog

Column: is meant to represent a database column, but it also has table and owner fields. I think this one is fine (until we normalize the schema and use a single Column object for a column of the database) but I mention it anyway.

👍