datopian / ckanext-versioning

Deprecated. See https://github.com/datopian/ckanext-versions. ⏰ CKAN extension providing data versioning (metadata and files) based on git and github.
https://tech.datopian.com/versioning/
GNU Affero General Public License v3.0
7 stars 4 forks source link

Remove obsolete method #33

Closed pdelboca closed 4 years ago

pdelboca commented 4 years ago

Method _fix_resource_data is no longer valid since we are not using a URL schema based on @ anymore.

shevron commented 4 years ago

You are right that this is obsolete, unfortunately the problem it solves still kind of exist. Here is the problem: https://github.com/ckan/ckan/blob/2.8/ckan/lib/dictization/model_dictize.py#L110 sets the url value to the full resource URL calling url_for in the model layer (😡), and it doesn't take the revision_ref parameter into account in any way. If we want to get the right URL for that revision to download the resource, it must be somehow mangled. The hack we previously had here that you just removed did that. Maybe there is a less ugly way to do it but I didn't find it.

rufuspollock commented 4 years ago

@shevron can we move this to the presentation layer (i.e. the controller) - and/or could it start to be relative (and not go via url_for)?

shevron commented 4 years ago

@shevron can we move this to the presentation layer (i.e. the controller) - and/or could it start to be relative (and not go via url_for)?

@rufuspollock The reason this was done in an action is because someone might be using the CKAN API to get download URLs, and will be getting the wrong URL if we don't mangle it in the action layer. But if we don't care about that case then yes, we can move to blueprint (this no longer goes through a classic "Controller").

As for relative paths, can you give an example of what you expect it to be? Again if I'm guessing correctly I think this may work for some cases but not for others (e.g. resources can be downloaded from both the resource page and the dataset page, and also the URL is useful to API clients calling the action directly).

shevron commented 4 years ago

@rufuspollock @pdelboca we need to make a decision here. I'm inclined towards closing this PR and not merging because I think we need to fix URLs in actions, it's ugly but still needed. Thoughts?

rufuspollock commented 4 years ago

@shevron i thought we now had a bespoke controller in which case this should definitely be going there not in the logic layer. I would really like to see this removed and the "hack" move much closer to the presentation layer where it is needed. If that is not possible then yes we need to leave this and update it (since it is the wrong url structure).

shevron commented 4 years ago

@shevron i thought we now had a bespoke controller in which case this should definitely be going there not in the logic layer. I would really like to see this removed and the "hack" move much closer to the presentation layer where it is needed. If that is not possible then yes we need to leave this and update it (since it is the wrong url structure).

@rufuspollock I'm not 100% sure what you mean by "controller" - is that in a theoretical sense or in CKAN specific terms? We have "actions" which are directly bound to the CKAN API and "blueprints" which are Flask's Web views and replace Pyramid's "Controllers" (so excuse me if I'm being confused by the terms here).

In either case as I mentioned above, moving this to anywhere "above" the CKAN action logic will cause URLs to be wrong for anyone trying to use the API and not the Web interface. That's the only reason I'm suggesting keeping this hack in the actions.

shevron commented 4 years ago

To clarify, API actions to not pass though the controller / blueprint.

rufuspollock commented 4 years ago

@shevron i mean in the blueprint in the method that actually renders a page.

I'm now getting confused why we need this specifically: i thought package_show is already overwriting the default package_show with resource objects / info we get from metastore so urls to blob files in storage shoul be correct. I thought this fix was for setting the path to the resource show page in the UI (?).

I'm now getting quite confused and think a simple summary of the problem and solution would be helpful - happy to get on a call.

shevron commented 4 years ago

@shevron i mean in the blueprint in the method that actually renders a page.

I'm now getting confused why we need this specifically: i thought package_show is already overwriting the default package_show with resource objects / info we get from metastore so urls to blob files in storage shoul be correct. I thought this fix was for setting the path to the resource show page in the UI (?).

I'm now getting quite confused and think a simple summary of the problem and solution would be helpful - happy to get on a call.

Problem: calling package_show and resource_show via the CKAN API will provide the download URL as part of the returned resource object. This is a full absolute URL to the CKAN download route, and is set deep in the model layer of CKAN (see https://github.com/ckan/ckan/blob/2.8/ckan/lib/dictization/model_dictize.py#L110). Unfortunately, this does not take the resource's revision into account, which means following this URL or using it to generate links in the GUI as is will cause a download of the wrong resource revision in many cases.

This problem affects both Web page access (in the dataset page and resource preview page) and CKAN API access. For HTML Web interface access, we may be able to hack around it in the blueprint / view layer of the resource preview page and dataset page, assuming no other CKAN pages use the same info (in CKAN extensions etc.).

For API access, this has to be "fixed" in the action (logic layer), there is no way to modify this in a higher layer that I'm aware of.

Solution: To maintain working API access and to avoid issues where the resource URL will be rendered / used by CKAN code other than the resource preview HTML page and the dataset page, we need to "fix" the URL in the action layer or lower. Only way I am aware of of fixing the URL in a way that will be compatible with all use cases is to hack it by adding the revision_ref parameter to it in the action layer (as is done today but not by adding a @revision suffix to the resource ID, but by adding the right query parameter).

shevron commented 4 years ago

Made obsolete by #43, closing