JMSLab / LaroplanOCR

Swedish primary school curricula (Läroplaner för grundskolan) in digital format.
MIT License
2 stars 0 forks source link

Add more recent Läroplaner and confirm permission to re-post them #3

Closed santiagohermo closed 2 years ago

santiagohermo commented 2 years ago

This issue has two goals: 1) Add more recent versions of the Läroplaner to the build 2) Confirm that we are allowed to re-post the pdfs of the Läroplaner in this repo (see here)

jmshapir commented 2 years ago

@santiagohermo thanks!

Per https://github.com/JMSLab/LaroplanOCR/pull/2#discussion_r793923912, my advice is to wait until @dagese weighs in before we reach out to the library on (2).

santiagohermo commented 2 years ago

Thanks @jmshapir! Unfortunately I already sent an email via their contact form :/

jmshapir commented 2 years ago

@santiagohermo no worries! Keep us posted. :-)

dagese commented 2 years ago

@santiagohermo @jmshapir

As you may have seen in the email I forwarded, it is fine for us to share materials based on the digital Läroplans.

santiagohermo commented 2 years ago

Thanks @dagese! I also got a positive response from Gothenburg University Library.

image

fyi @jmshapir @miikapaal

santiagohermo commented 2 years ago

Reminder to self. Confirm that we use the 'subject' version of the 1980 laroplan because it includes the non-subject version

santiagohermo commented 2 years ago

Notes to self:

santiagohermo commented 2 years ago

Maybe @miikapaal or @dagese can give me a quick hand here? I'm trying to figure out what files correspond to the 1994 and 2011 versions of the curriculum. Gothenburg University posts curricula here

1994 års läroplan för det obligatoriska skolväsendet (Lpo 94)

My impression is that the introduction to the 1994 curricula is the file I highlighted in the screenshot below, and all other files correspond to subjects. Do you agree?

Screenshot

![image](https://user-images.githubusercontent.com/45404755/152703230-9c7158f6-8f1f-4e73-95e7-b957afa86911.png)

2011 Läroplan för grundskolan

The 2011 seems more straightforward. I think it should be the one in the middle or on top of the files I highlighted below (they are the same file). See here

The bottom file of the ones I highlighted below (this one) is actually for pre-school, so it shouldn't be included.

Do you agree?

Screenshot

![image](https://user-images.githubusercontent.com/45404755/152703317-1a29b520-b77f-48cb-ad46-fbe5d29d9e34.png)


If my intuition is correct I think that adding the 1994 curriculum will ~ill~ give a bit of work, since we should decide what files are relevant to include in the build. The 2011 curriculum seems straightforward though

jmshapir commented 2 years ago

If my intuition is correct I think that adding the 1994 curriculum ill give a bit of work, since we should decide what files are relevant to include in the build. The 2011 curriculum seems straightforward though

@santiagohermo FWIW, if this is the case, it seems fine to me to include only 2011 in the initial release, and to plan to include 1994 in a later release. Among other things, this would allow us to gauge interest / utilization before devoting significant additional effort.

santiagohermo commented 2 years ago

Thanks @jmshapir! That makes sense. I think that if we can quickly figure it out in the call, then let's include them now. If it takes more effort we can plan to include in a later release.

santiagohermo commented 2 years ago

Per today's call:

Thanks for the help @dagese @jmshapir @miikapaal!

jmshapir commented 2 years ago

Thanks @santiagohermo!

For 2011, I think we said that we should try if possible to exclude the pages that relate to preschool (förskola).

For 1994, I'm not sure if preschool is included there?

@dagese please let us know if I have this wrong, thank you!

dagese commented 2 years ago

Thanks @jmshapir @santiagohermo!

I reviewed the docs carefully. None of the curricula include pre-school.

Förskoleklass != förskola (= preschool). At some point, perhaps in 2011, the Swedish primary school was extended from ages 7-16 to 6-16. The first grade was called pre-schoolclass, förskoleklass, and is therefore included in the 2011 curriculum. So no need to discuss pre-school as we think of it.

There is a separate curriculum for pre-school but it does not look to be included in these documents.

Sorry for confusion.

jmshapir commented 2 years ago

Thanks @dagese! Your language is subtle. :-)

In that case I think the plan reverts to the one in https://github.com/JMSLab/LaroplanOCR/issues/3#issuecomment-1031544754, minus the point about preschool.

@dagese let us know if that doesn't sound right, and thanks again!

dagese commented 2 years ago

Thanks @dagese! Your language is subtle. :-)

In that case I think the plan reverts to the one in #3 (comment), minus the point about preschool.

@dagese let us know if that doesn't sound right, and thanks again!

Thanks @jmshapir! This plan sounds great.

santiagohermo commented 2 years ago

Thanks for the discussion @dagese @jmshapir! I'll move forward and implement the plan then. I'll let you know when I make some progress and we can make sure I got the right files.

santiagohermo commented 2 years ago

Quick question @jmshapir. I noticed that the pdf of laroplaner are actually quite heavy. Currently, the raw folder weighs 389mb, which makes cloning the repo a bit slow.

I think it's convenient to host the pdfs in the repo, even if cloning is a bit slower. But maybe there are other downsides (is there a problem with the cost?). Alternatively, we could ignore the pdf files as well and have a python script that downloads them. What do you think?

jmshapir commented 2 years ago

@santiagohermo I think at this scale the size is manageable, and I agree it's nice to have everything self-contained if we can.

If storage impact starts getting much bigger we could revisit and try an approach like what you say (e.g. grabbing PDFs from website).

Another note is that we should avoid making revisions to the PDF files when we can. Since git can't really "diff" those, the storage impact of even a small change to the PDF can be very large.

Thanks!

santiagohermo commented 2 years ago

Thanks @jmshapir! That makes sense, and I don't think we'll ever need to modify the pdfs. I'll proceed with this issue then.

Let me note that the new laroplaner bring the weight of the /raw/ folder to 419mb, ~30mb more, which seems manageable as well.

santiagohermo commented 2 years ago

I finished the addition of the new files to the pipeline @dagese @jmshapir @miikapaal. The updated figure from the example looks like this

I'll move to PR now

santiagohermo commented 2 years ago

Continues in its PR #8

santiagohermo commented 2 years ago

Summary: In this is issue we did the following:

Changed merged to master in https://github.com/JMSLab/LaroplanOCR/commit/bd61c91d53edc2bdec6f6c2e61fad26d36913eef.