digitalfabrik / integreat-cms

Simplified content management back end for the Integreat App - a multilingual information platform for newcomers
https://digitalfabrik.github.io/integreat-cms/
Apache License 2.0
55 stars 33 forks source link

Add KURSNET offer #2739

Open steffenkleinle opened 2 months ago

steffenkleinle commented 2 months ago

Motivation

As a user of the integreat app I want to see language courses and job trainings I can take part in directly in the app.

Proposed Solution

We should implement the KURSNET API in the CMS and provide the available offers via an api for the apps. Tasks:

Alternatives

Implement the API in the apps.

User Story

As a user of the integreat app I want to see language courses and job trainings I can take part in directly in the app.

Additional Context

From the integreat-app issue: https://github.com/digitalfabrik/integreat-app/issues/2702 (please notify once this is done).

Courses for language classes and job trainings are centrally organized by BAMF and BA in a database called KURSNET. After a few years of communication we found out that different organizations just use the data from KURSNET in their own platform, altough there is not really an official API.

It's still unclear if those data have an official API or needs to be crawled. We have found this but it needs to be checked: https://github.com/AndreasFischer1985/weiterbildungssuche-api

Also we might crawl the data from the platform (https://web.arbeitsagentur.de/sprachfoerderung/home). For Integreat we are just interested in the data which are categorized as "Sprachförderung und Migration" which is sub-categorized into 4 further topics

Design Requirements

None.

steffenkleinle commented 2 months ago

@dkehne who was involved in finding https://github.com/AndreasFischer1985/weiterbildungssuche-api? Someone from the tech-team which could perhaps provide a little more information on the current status here?

dkehne commented 2 months ago

This is a real benefit to our users so we should find a solution in 24Q3...

svenseeberg commented 1 month ago

It's still unclear if those data have an official API or needs to be crawled. We have found this but it needs to be checked: https://github.com/AndreasFischer1985/weiterbildungssuche-api

IMHO we should carefully discuss if we want to depend on a library that

a) is maintained by a single person, b) can break any moment if KURSNET decides to change its layout, c) unclear consequences in terms of effort for fixing it.

Additionally, the concept of crawling the website is not officially sanctioned. If KURSNET blocks the IP of our server (rate limiting of whatever), we need to implement pretty crazy workarounds.

I have a very strong opinion: we should not rely on unsupported interfaces for retrieving data for production systems. This will break eventually. It is only a question of when.

steffenkleinle commented 1 month ago

It's still unclear if those data have an official API or needs to be crawled. We have found this but it needs to be checked: https://github.com/AndreasFischer1985/weiterbildungssuche-api

IMHO we should carefully discuss if we want to depend on a library that

a) is maintained by a single person, b) can break any moment if KURSNET decides to change its layout, c) unclear consequences in terms of effort for fixing it.

Additionally, the concept of crawling the website is not officially sanctioned. If KURSNET blocks the IP of our server (rate limiting of whatever), we need to implement pretty crazy workarounds.

I have a very strong opinion: we should not rely on unsupported interfaces for retrieving data for production systems. This will break eventually. It is only a question of when.

I mostly agree. relying on unsupported interfaces is definitely a risk. If we'd just occasionally crawl the APII and store the offers in the CMS, this would perhaps be a risk we could take (but not necessarily should take). Depending how much the data in KURSNET changes, this could be an okay solution.

I definitely agree with you to not just directly retrieve the API data in the apps or a proxy in the CMS.

steffenkleinle commented 1 month ago

Discussion on the conference:

On hold until user stories/need is evaluated.

deen13 commented 1 month ago

Some thoughts from todays discussion at the conference just for the record: