input: Add disable direct web fetch by default

michaelwood commented 1 month ago

This adds the setting allow_direct_web_fetch which configures whether the functionality to allow using a web direct fetch i.e. not via a form on the frontend but via a direct GET parameter to be fetched from will be allowed.

Re-instate csrf_protect

michaelwood commented 1 month ago

fyi @jpmckinney

@odscjames if there is nothing further needed right now shall I make a minor release with associated changelog/tag etc?

jpmckinney commented 1 month ago

Thanks!

Several CoVE's have a link on the homepage to submit some sample data. For OCDS, we can just remove that link, but not sure if you wanted to develop some way to preserve it (e.g. fill in the URL and submit the form via JS).

jpmckinney commented 1 month ago

Also, curious, why is this set via COVE_CONFIG and not settings at top-level? COVE_CONFIG gets passed to lib-cove-*, which doesn't (need to) understand this option.

michaelwood commented 1 month ago

Thanks!

Several CoVE's have a link on the homepage to submit some sample data. For OCDS, we can just remove that link, but not sure if you wanted to develop some way to preserve it (e.g. fill in the URL and submit the form via JS).

Ah right, yeah agree a click handler in JS on the link would be the simplest solution to maintain functionality with the new settings in the default off state, alternatively just expect people to download and upload the sample (which might be better to save some cycles when the site is being web crawled), AFAIK the "preview a sample" is standard specific so I'll check the other users of lib-cove-web.

Also, curious, why is this set via COVE_CONFIG and not settings at top-level? COVE_CONFIG gets passed to lib-cove-*, which doesn't (need to) understand this option.

My only thought was that it sits logically with input_methods (i.e. https://github.com/OpenDataServices/lib-cove-web/blob/main/cove/context_processors.py#L6 ), happy to move config item up a level if it makes more sense. I'm not entirely clear on what the lines of separation in cove are/intended to be.

odscjames commented 1 month ago

Can we have something in the changelog? Thanks!

odscjames commented 1 month ago

alternatively just expect people to download and upload the sample (which might be better to save some cycles when the site is being web crawled)

That seems like a worse UI solution that would just put people off. I think we need to keep the link option somehow.

This is way beyond the scope of this PR but if web-cycles and crawl performance is a concern what we really want is not to re-process the data any time a user clicks the link. Instead there could be some special way to mark this data as sample and it's not expired and removed like all the other uploads are. We process the sample data once then every user just sees the same set of cached results. (We'd have to make sure we reprocessed the sample when something in the data processing mechanism changed - I've done this in other code already).

odscjames commented 1 month ago

fill in the URL and submit the form via JS

Also (Psuedo code):

<form>
{% csrf_token %}
<input type="hidden" name="source_url" value="...">
<input type="submit" value="See a sample with ..."
</form>

There are ways we can style buttons as links - or just leave it as buttons, I don't think that's so bad.

odscjames commented 1 month ago

I think ideally the changelog should be on the same commit as the code, as that way when someone looks later and uses git blame it's easy to find the code and the intention behind the change too. As this is a small change tho I think it's ok.

The rest looks good to me - I don't have strong views on which level the config should sit so happy to go with this.

OpenDataServices / lib-cove-web

input: Add disable direct web fetch by default #144