CorrelAid / correlaid_website

Source code for the CorrelAid website
https://correlaid.org
3 stars 0 forks source link

Concept for publishing (anonymized) project data #457

Closed friep closed 10 months ago

friep commented 10 months ago

I would like to make the project data as open as possible. Data can then be used for:

Background

The goal is to have a full project description but often, this is not achieved (lack of time on project team). Then the project does not get the visibility it deserves. Sometimes, we lose contact with the NPO partner and do not get the "consent" to publishing the project. With this issue, i introduce the concept of an "anonymized" project that allows us to still make the full extent of our work visible while retaining the anonymity of the partner organization.

anonymized data

The following data could be open without concern:

When should a project be fully public (i.e. mentioning the partner organization)?

imo, if there are project outputs that are public (e.g. a public report, a public dashboard), we should be able to assume that the organization is fine with the project being de-anonymized/them being mentioned as project partners.

Data format options

linked to #231

friep commented 10 months ago

it is not possible to conditionally expose fields via the API (e.g. only make org name and website available if status == published).

friep commented 10 months ago

my current idea is to build a flow that does the "anonymization" (e.g. only keep org name / website / text for "published" orgs) within directus and then pushes a json file to Directus CMS using the API. the data would then be available as a simple file (which I think is reasonable for the small size). could then also keep metadata (last updated) within the json file itself:

{ 
meta: {
   last_updated
   n
}, 
projects: [ 
  project1
  project2 
  ...
]
}
friep commented 10 months ago

@jstet thoughts?

jstet commented 10 months ago

Thought about this and did some research as well and it seems like there is no easy solution unfortunately. I have another idea however: We could create a collection that contains sensitive project data only and then link entries in this collection to the projects. Then we could unpublish the sensitive data entries separately but keep the non-sensitive rest of the fields public.

friep commented 10 months ago

thanks!! mhh.. i think then i'd rather opt for doing this as part of the publishing instead of having complicated internal structures. I introduced a status "published/not published" for Organizations and then built a workflow in directus: https://cms.correlaid.org/admin/settings/flows/94abbdbc-cf74-458a-877b-499e1e08ece6

this works quite decently now but one thing i noticed is that the "anonymized" status does not depend on the organization but on the project itself. for example we do 4 projects with NPO A, two are published in accordance with them, two are only for anonymous publication. Then the latter two projects with NPO A would still show NPO A. Hence, i'd propose to add an additional choice "published_anon" to the status field of the project. This would allow for "anonymizing" both in the website and in the flow. or even rename "not_published" to "published_anon".. however, i have to think about whether really all projects should be this visible or whether we want to reserve the right to keep some truly hidden.

friep commented 10 months ago

another idea coming back to your cms thoughts.. maybe we could build a "duplicate", "automated" clone of the projects table that is a "sanitized" version of the projects table which replaces links to the organization with a link to a generic dummy organization. Automatically updated using flows whenever there is an edit on projects. This would have to move the legal_form and sector attributes from org to the project table.

jstet commented 10 months ago

Another idea: Restrict the projects the public role can access (only public projects) and use the admin role in svelte kit. Then we could do data processing (only put desired projects and fields based on some other fields on the website -> also anonymized projects)

KonradUdoHannes commented 10 months ago

Adding to the last comment, we could in fact have a non-public API for directus for part of the data (i.e. projects) and still use it for the website, either partially or in a transformed state.

I think the only restriction this imposes is that we can only use/fetch this data server side in svelte-kit. For the most part this is not an issue at all because the static build happens server side anyways. We could therefore include a secret on our server (or the service that builds the website) to access the non-public API and the secret/api-access would not be exposed to the user. We could also do (one-way) data processing server side with that data to for instance drop fields. The user would never have access to the original data as the static page only contains the transformed data.

friep commented 10 months ago

the problem is that i want to expose some fields (not: records) conditionally (e.g. only give access to organization name + website + description IF status == published) and that doesn't seem possible with directus. it'd be sad to not give those data for those projects where we can because it's quite interesting.

friep commented 10 months ago

but 👍 re the transformation server side. if that's possible, that'd be great.

KonradUdoHannes commented 10 months ago

Server side transformation should also be able to handle the status = published logic, i.e. do a different transformation depending on this criteria.

Its then a bit of a design decision whether one would rather have this logic as part of the website code or inside a workflow in directus.

Generally we are trying to do as-much-as-possible/everything serverside anyways for statbility reasons to reduce error/bug frequency on the client side.

friep commented 10 months ago

@KonradUdoHannes : Jonas and I discussed the following approach:

  1. no changes to directus except for adding "published_anonymized" as a status choice to project table
  2. separate approaches for using the data for the website/ the project database page (#231 ) and the purpose of making data available as open data. Both approaches use the same "anonymizing" logic but different technologies. in particular:
    • for the website, you do the logic during server side transformation
    • for publishing the data as open data for other uses (by ourselves but also other actors), we use a directus workflow (as outlined above) that publishes a file to assets
  3. we close the public role apis for the following tables: organizations, projects, project outputs, project people. this avoids confusion where data is public .

if you both agree, i'll open up follow up issues for the different approaches and sketch out the logic here.

KonradUdoHannes commented 10 months ago

Sounds like a good strategy to me.

friep commented 10 months ago

closed as follow up issues are completed.