Closed friep closed 10 months ago
From a general safety perspective I would suggest to implement the such that it whitelists what should not be anonymized, rather than blacklisting what should be. If the default is anonymous accidental exposure of sensitive information is a little less likely. The downside is of course that a project might more easily be anonymous by mistake, but I would expect this to be less critical.
agree on whitelisting 💯 will specify this tomorrow.
here's the construction of the project objects:
projects = projects.filter(proj => ["published", "published_anon"].includes(proj.status));
let new_objs = [];
// for each project
projects.forEach((proj) => {
// extract whether project is anonymized
let is_anon = proj.status == "published_anon";
// flatten organizations and LCs
proj.Organizations = [...proj.Organizations.map((o) => o.Organizations_id)];
proj.Local_Chapters = [
...proj.Local_Chapters.map((o) => o.Local_Chapters_id),
];
// filter out not public outputs
proj.Projects_Outputs = proj.Projects_Outputs.filter(
(out) => out.is_public
);
// anonymize People, Posts, Outputs, Podcast
proj.Projects_Outputs = is_anon ? [] : proj.Posts;
proj.Posts = is_anon ? [] : proj.Projects_Outputs;
proj.People = is_anon ? [] : proj.People;
proj.Podcast = is_anon ? null : proj.Podcast;
let orgs = [];
// organizations
proj.Organizations.forEach((org) => {
let reduced_org = new Object();
reduced_org.id = is_anon ? -99 : org.short_id;
reduced_org.short_id = is_anon ? "ANO" : org.short_id;
reduced_org.legal_form = org.legal_form;
reduced_org.sector = org.sector;
if (is_anon) {
reduced_org.translations = [];
} else {
reduced_org.translations = org.translations.map((trans) => {
return {
name: trans.name,
website: trans.website,
description: trans.description
}
});
}
orgs.push(reduced_org);
});
proj.Organizations = orgs;
// local chapters
proj.Local_Chapters = proj.Local_Chapters.map((lc) => {
return { short_id: lc.short_id, founded: lc.founded }
});
// translations
if (is_anon) {
proj.translations = proj.translations.map((trans) => {
return { title: trans.title, summary: trans.summary }
});
} else {
proj.translations = proj.translations.map((trans) => {
return { title: trans.title, summary: trans.summary, description: trans.description }
});
}
// returning new object
let proj_obj = (({
id,
status,
start_date,
end_date,
project_status,
team_size,
is_internal,
data,
type,
language,
Organizations,
Projects_Outputs,
Podcast,
People,
Posts,
translations
}) => ({
id,
status,
start_date,
end_date,
project_status,
team_size,
is_internal,
data,
type,
language,
Organizations,
Projects_Outputs,
Podcast,
People,
Posts,
translations
}))(proj);
new_objs.push(proj_obj);
});
please refactor for better code - my javascript skills are super rusty!
@friep sure, no problem. This will happen automatically when integrating with our existing parsing functionality.
Regarding the issue general, I see two related tasks. Anonymizing projects for displaying project cards and anonymizing projects for displaying project subpages. I'll start with the cards, but I was wondering whether the subpages for anonymous projects is actually a required use case. @friep do you know whether this will be the case? If so I'll probably create and link a separate issue as we would need to adjust the project page design for that case.
When I checked this morning directus only had one anonymous project whiteout a subpage, which is a good starting point for testing, but I assume there is more to come.
yes, there will be more projects like this (actually probably the majority of projects..)
re design: related is #231 where Jonas and me discussed the separate page. we would then link from the daten-nutzen
page to the other page which would also get its own dropdown menu entry. there we had discussed that each project would get a subpage due to the amount of space needed even for anonymous projects (e.g. the project summary can get quite long).
definitely this issue is only for adopting the server side code. I assumed you could complete this independently (e.g. by filtering client-side for published
) before doing client side things like design.
made progress on #470 but would wait on working on it further until you have implemented your solution here.
here's the full code of the relevant directus flow building block:
module.exports = async function(data) {
// Do something...
// only projects that are published or anonymized published
projects = data.read_projects;
projects = projects.filter(proj => ["published", "published_anon"].includes(proj.status));
let new_objs = [];
projects.forEach((proj) => {
// extract whether project is anonymized
let is_anon = proj.status == "published_anon";
// flatten organizations and LCs
proj.Organizations = [
...proj.Organizations.map((o) => o.Organizations_id),
];
proj.Local_Chapters = [
...proj.Local_Chapters.map((o) => o.Local_Chapters_id),
];
// filter out not public outputs
proj.Projects_Outputs = proj.Projects_Outputs.filter(
(out) => out.is_public
);
// anonymize People, Posts, Outputs, Podcast
proj.Projects_Outputs = is_anon ? [] : proj.Projects_Outputs;
proj.Posts = is_anon ? [] : proj.Posts;
proj.People = is_anon ? [] : proj.People;
proj.Podcast = is_anon ? null : proj.Podcast;
let orgs = [];
// outputs
proj.Projects_Outputs.forEach((output) => {
output.translations = output.translations.map((trans) => {
return {
language: trans.languages_code.code,
description: trans.description,
};
});
});
// organizations
proj.Organizations.forEach((org) => {
let reduced_org = new Object();
reduced_org.id = is_anon ? -99 : org.id;
reduced_org.short_id = is_anon ? "ANO" : org.short_id;
reduced_org.legal_form = org.legal_form;
reduced_org.sector = org.sector;
if (is_anon) {
reduced_org.translations = [];
} else {
reduced_org.translations = org.translations.map((trans) => {
return {
language: trans.languages_code.code,
name: trans.name,
website: trans.website,
description: trans.description,
};
});
}
orgs.push(reduced_org);
});
proj.Organizations = orgs;
// local chapters
proj.Local_Chapters = proj.Local_Chapters.map((lc) => {
return { short_id: lc.short_id, founded: lc.founded };
});
// translations / project description
if (is_anon) {
proj.translations = proj.translations.map((trans) => {
return {
language: trans.languages_code.code,
title: trans.title,
summary: trans.summary,
};
});
} else {
proj.translations = proj.translations.map((trans) => {
return {
language: trans.languages_code.code,
title: trans.title,
summary: trans.summary,
description: trans.description,
};
});
}
// returning new object
let proj_obj = (({
id,
status,
date_updated,
start_date,
end_date,
project_status,
team_size,
is_internal,
data,
type,
language,
Organizations,
Projects_Outputs,
Podcast,
People,
Posts,
translations,
}) => ({
id,
status,
date_updated,
start_date,
end_date,
project_status,
team_size,
is_internal,
data,
type,
language,
Organizations,
Projects_Outputs,
Podcast,
People,
Posts,
translations,
}))(proj);
new_objs.push(proj_obj);
});
let meta = {
last_published: Date.now(),
last_updated: new Date(
Math.max(...new_objs.map((e) => new Date(e.date_updated)))
),
n: new_objs.length,
};
return {
meta: meta,
projects: new_objs,
};
}
in a previous attempt i also wrote a graphql query. maybe that's useful:
query Project {
Projects(filter: { status: { _in: ["published", "published_anon"] } }) {
id
status
date_updated
project_status
start_date
end_date_predicted
end_date
is_internal
team_size
data
type
language
Podcast {
id
soundcloud_link
title
description
}
Projects_Outputs {
url
output_type
is_public
translations {
description
languages_code {
code
}
}
}
Organizations {
Organizations_id {
id
short_id
legal_form
sector
translations {
languages_code {
code
}
website
name
description
}
}
}
translations {
title
description
summary
languages_code {
code
}
}
Local_Chapters {
Local_Chapters_id{
id
short_id
founded
}
}
}
}
I've merged some functionality into production that so far only focusses on the project cards on the project overview page. The new functionality can be seen when scrolling through the production page as there is a single new project card with an "Anonymous Organization", is created by the one anonymous published project that we currently have set up.
Project "slug" pages are not implemented yet, but I don't expect this to be too much trouble. And its probably easier to discuss once we have concrete examples for this as well.
Also projects linked on the LC pages directly don't use anonymization yet (but they also don't fetch anonymous projects).
@friep regarding #470, I'm not sure the issues are too closely related. the website only parses the information that it displays, so some data does not need to be anonymized because its never extracted to begin with. Nonetheless, if there is anything I'm not aware of where output/design from this issue can help with #470, I'm happy to discuss it further.
from what i understand, this lgtm. in the future ( #231 ) , we'd need the sector
and legal_form
of the organization even if anonymized. however i think this is something that can be added later on when working on this.
the slug is actually interesting because so far, they were the project ids (e.g. 2020-03-ERL), with the last three letters typically having some sort of relation to the organization (in the example, ERLassjahr). sometimes this would be very apparent, e.g. if it was Arbeiterwohlfahrt -> AWO. this is why i have left them out so far. but I think i'll just give out new IDs to those projects that focus more on the content, not the org
LCs not fetching those projects yet is ok for me.
from my POV we can close this issue.
Ok then lets close this for now. I'm also in favor of parsing and potentially anonymizing additional fields once they become relevant, i.e. when implementing #231 for instance.
for projects where status ==
published_anon
:organization data
only keep sector and legal form
respectively:
project outputs
set to [] (avoid spilling information contained in those)
blog posts
set to [] (dito)
podcast
set to [] (dito)
project people
set to []
the last three are just a precaution - this shouldn't happen that we have a podcast episode and have to anonymize. but better be safe than sorry :)
follow up to #457