gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
76.26k stars 7.55k forks source link

Option to disable resources duplication #11995

Open TiGR opened 10 months ago

TiGR commented 10 months ago

There is a problem that page resources are being duplicated to all the languages. This is going to be fixed in 0.123, but only for markdown (could be enabled as an option for goldmark renderer)

Here is an example site that shows the issue. There are 2 pages named test and test2. Both pages are translated into 2 languages. Translations in test page are in markdown format. Translations in test2 are in html format (while original language is still in markdown).

In hugo 0.122 and prior versions resources for both pages are being duplicated into all languages. In hugo 0.123 they are duplicated only if the translations are non-markdown pages. In the example above site test page resources are not duplicated, but for test2 they are.

This behavior seems to be a bit counter-intuitive, since resource duplication depends on source translation file format. Also, there is no way to disable this.

I think that there should be some option, that would control resource duplication for translations, and it should not be format-dependent.

In our case it creates a lot of problems. Our website that is 4,5Gb for main language and it is translated into 11 additional languages, and since translations are all in html files, all the resources (even not used ones) are being copied and duplicated int all the languages. So, the entire website ends up in 33Gb. If we remove all the duplicated resources, this number drops to 6,5Gb.

Building and deploying 33Gb is very resource-intensive, and our pipelines (that include publishing) run not for minutes, but for over an hour.

Having this fixed is really critical for us, and we were eagerly anticipating 0.123, but it does not solve our problems.

bep commented 10 months ago

I agree that we should fix this, but it will not happen in v0.123.0. We are more or less ready for a release (the documentation is written etc.) and we don't want to introduce some last minute surprise (I'm sure there are enough of those without us having to introduce them).

jmooring commented 10 months ago

I understand the point you raise, but I want to clarify this statement for anyone else who stumbles across this:

resource duplication depends on source translation file format. Also, there is no way to disable this.

You can disable the format-dependent behavior with:

markup:
  goldmark:
    duplicateResourceFiles: true

I know that doesn't solve your problem, but it provides uniform behavior for anyone who needs/wants it.

TiGR commented 10 months ago

Also, I thinks that this option to duplicate resources could (optionally) be moved down to page level. For instance, we might want to disable this behavior globally and then enable it for a section or a specific page via frontmatter flags.

TiGR commented 6 months ago

@bep

I agree that we should fix this, but it will not happen in v0.123.0

So... Could it be fixed in v0.127.0?

@jmooring

You can disable the format-dependent behavior with:

The problems is, that due to the way we do the translations, all translation pages are not in markdown format, but in HTML. So, there seems to be no way to disable this behavior.

And possibly it could have be controlled from a single configuration point, since this is not semantically related the transformation of content pages.