GetDKAN / dkan

DKAN Open Data Portal
https://dkan.readthedocs.io/en/latest/index.html
GNU General Public License v2.0
368 stars 170 forks source link

Don't require file_public_base_url to be set manually #4205

Open grugnog opened 3 months ago

grugnog commented 3 months ago

Feature Description

Every new DKAN site requires file_public_base_url to be set with the full URL of the site if sample content is to be created.

Detailed Feature Description

If I understand the issue, it is because we use a harvest to create the sample content, and that harvest works by downloading remotely, but sites don't always know their own URL when called from the command line?

Problem and Motivation

This is a paper cut that can trip up users and adds steps to the setup process.

Potential Benefits

Makes it easier to test and deploy new DKAN sites.

Target Audience

Possible Implementation

Not sure what the easiest approach, but I think for a full fix we need to enable sites to harvest from a local file path/uri (without redownloading it), at least when called programmatically.

Additional Context

No response

dafeder commented 3 months ago

I remember at the time this was done we were very annoyed that it was the best solution we could come up with, but it's likely that if we revisit it we can come up with something better. Possibly we were being too stubborn at the time about not using Drupal-specific paradigms in the API or Harvest specs like "public://"? But it seems like yes you should be able to put either an absolute file path or a public:// url in a harvest and it should work.

paul-m commented 2 months ago

The only place where file_public_base_url is mentioned in code is Drupal\harvest\Transform\ResourceImporter::saveFile(), which is only used by the sample_content harvest.

I experimented with commenting out the file_public_base_url setting in a local ddev environment, and then building a site and doing the sample content import.

It turns out we might not actually need this setting to be explicitly set before showing sample content.

Some git archaeology tells us that the comment about file_public_base_url comes from 2ab75ca806313bf51427cae80353c7c358168ab6 (2019), and since then much refactoring of ResourceImporter has happened.

In terms of testing, ResourceImporter has the @codeCoverageIgnore annotation, so we don't know what covers this behavior. Removing this annotation, we find that it's not covered by any test.

Recommendations:

dafeder commented 2 months ago

@paul-m I agree with the first three recommendations, but I think the ResourceImporter is potentially useful for other default content modules people might implement for their own sites, and also is one of the few examples of a transform class, so I would keep it where it is.

paul-m commented 1 day ago

Review question from @janette Does it hurt to remove the config?